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ABSTRACT 

The File Organization project has made available a 
set of programs which are designed to operate on large files of 
machine readable bibliographic records. These programs are designed 
as an instrument for understanding and refining the techniques of 
bibliographic search. This document discusses four aspects of the 
system: (1) The retrieval prograin, CIMARON, is an on-line, 
interactive system with two coraplementary modes of 

operation— searching and browsing; {2) CIMARON2 terminal operator's 
guide is a step by step use of the syatem through an on-line computer 
terminal; (3) The BROWSER2 terminal operator's guide describes a 
program which is an Independent routine used to scan currently stored 
index files, to save index terms temporarily, and to obtain hard copy 
of the displayed terms; and (4) A user's guide to file building. 
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FOREWORD 



This report contains the results of the second phase (Ju 3 .y, 

1968 - June, 1970) of the File Organization Project, directed toward 
the development of a facility in which the many Issues relating to 
the organization and search of 'bi'bliographlc records in on-line com- 
puter environments could he studied. This work was supported hy a 
grant (OEQ-I-T-OTIO83-5O68) from the Bureau of Research of the 
Office of Education, U.S, Department of Health, Education, and Welfare 
and also hy the University of California, The principal investigator 
was M,E. Maron, Professor of Llhrarianship and Associate D:Lrector, 
Institute of Library Research; the project director and project manager 
were, respectively, Ralph M. Shoffner and Allan J. Himphrey, Institute 
of Llhrary Research. 

This report is heing issued as seven separate volumes: 

• Shoffner, Ralph M., Jay L. Cunningham, and Allan J. Humphrey. 

The Organization and Search of Bibliographic Records in On-line 
Computer SystemB: Project Summary . 

• Shoffner, Rali^ M. and Jay L, Cunningham, eds . The Organization 

an d Search of Bibliographic Records: Component Studies . 

• Aiyer, Arjun K. The CIWJQN System; Modular Programs for the 
Organization and Search of Large FlleB . 

• Silver, Steven S. HTX: Interactive Assembler Language 

Interpreter Users' Manual . 

‘ Silver, Steven B, Users* Guide to the Format Manipulation 

System for Hatuyal Language Documents . 



• Silv'er, Steven S. and Joseph C. Meredith, DIBCU 5 jn+.s-rHct-Lve 
System Users' Manual . 

* Smith, Stephen F. and William Harrelson, TMS: A Terminal Monitor 

System for Information Processing. 

Because of the Joint support provided by the Infomation Processing 
Laboratory Project (OEG- 1 - 7 - 071085 - 4286 ) for the development of 
DISCUS and of TJ^, the volumes concerned with these programs are in- 
cluded as part of the final report for hoth projects. Also, the 
CIMARON system (which was fully supported by the File Organization 
Project) has been Incoipiorated Into the Laboratory operation and 
therefore, in order to provide a balanced view of the total facility 
obtained, the volume is included as part of the Laboratory project 
report. (See Maron, M.E. and Don Shennan, et al. An Infomatioi 
Processing Laboratory for Education and Research in Library Science ; 

Phase g. Institute of Library Research, 1971.) 
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GENERAL DESCRIPTION OF CIMARON SYSTEM 



!•! General Retxj.eval from Large Files 



The Pile Organization Project* has made availahle a set of programs which 
are designed to operate on large files of ■bihliographic records, typically 
machine— form catalog entries for monographs. From the educational and research 
point of ■vriew, these programe are designed as an instrument for understanding 
and refining the techniques of 'bibliographic search. At present, access is 
provided to two data bases in the MARC II record str’ cture: 

a. 95*000 records, representing approximately 65 ^ of the holdings of the 
library of the University of California at Santa Cruz. By 1971* 
this file will grow to 120,000 records and represent over B0% of the 
Santa Cruz caucus holdings . 

h, 5, "00 records, representing a portion of the collection of the 

University Hospital, U.C., San Diego. This smaller file is focused 
almost entirely on medical topics . 

The retrieval program, CIMARON, in common with other programs in the Information 
Processing Laboratory, operates Interactively with the students and researchers 
who use it. The files are organized so that they can be searched "on-line," 
i.e., while the user waits. In most cases, searches are performed in le?3S than 
ten seconds . 



Any search request may consist of a series of search keys connected by 
Boolean operators and utilizing parentheses. Allowable search keys for a given 
data base are specified at the time the data base is locked into the system. 

For the San Diego file, fo-ur keys currently are allowables Author, Subject, 
Title, and Dolbyized Author.** For the Santa Cruz file, due to present 
limitations of available disc space, only Author and Subject search keys are 
permj.tted. In principle, other search keys such as Qsries , Publisher, Publica- 
tion Date, Class Number, Dewey Decimal No., etc., can be generatad. The search 
key lists are as follows: 



Current 



AU/ - Author 

TI/ - Title 

SU/ - Subject 

AD/ - Dolbyized Author 



Planned 

SE/ - Series 

PU/ - Publisher 

PD/ - Publication Date 

CN/ - Class Number or Call limber 

DD/ - Dewey Decimal Number 



In CIMARON, search requests consist of a set of Search Keys, having an 
explicit relationship between them. The user defines this relationship in 
terns of three Boolean connectives: AND, OR, NOT. The meanings of these 

connectives are as follows: 

*USDHEW, Grant No. OEG-l-'r-0T1083-5068. 

**Dolbyized Author Names refer to a procesB of association names that are 
similar phonetically but spelled differently: Tschalkovskli , Tshalcovsky, 

Chaikowskl, etc. 
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'AU/FREUD' AJ^D 'SU/DREAMS* 
'AU/PREUD' OR 'SU/DREAMS' 

NOT ’AU/PREUI)' AND ’SU/DREAMS’ 

NOT 'AU/PREUD' AND 'AU/PREUD* 
NOT 'AU/FREUD' OR 'AU/PREUD* 

Further, CIMAROW allows parenthetic 

'SU/DDEAl^' AND ('SU/FREUD* OR 
'SU/JUKG' ) 

('AU/PREUD' or 'AU/JUNG' ) AND 
( 'SU/DREAI® ' OR 'SU/HYSTERIA' ) 



(All books written by Freud and 
about dreams ) 

(All books written by Freud, as 
well as all books about dreanis 
including those written by Freud) 

(All books about dreams, except 
those written by Freud) 

(The null set) 

(The universal set) 

search requests to be formulated; 

(All books about Freud's or Jung's 
work on dreams ) 

(All books written by Freud or 
JiAng on either of the two subjects, 
dreams or hysteria) 



With such a powerful variety of options available, CIMARON users are 
able to explore a number of manual and computer specific search strategies. 

They are able to formulate comparisons between various manual and automated 
methodologies related to search formiJ.ation and search es^ansion. In addition, 
the users gain many direct insights into the structure of machine-form 
bibliographic records , especiaJ.ly the relationship between the identification 
of bibliographic data elements and the formulation of search requests. 

The CIMARON program was planned as the core program within an e^^anding 
system devoted to experimentation with organiaatloh and search of large files 
of bibliographic data. As a result, CIMARON was designed as a modular 
program with separate segments for; 



a. selection of data base to be searched’ 

b. negotiation of the search request; 

c. analysis of the search request and index file search; 

d. report of the search results; 

e. retrieval and display of the master records; 

f. search Iteration or termination. 

Also, an internal data logging procedure has been developed to provide 
extensive information about both the behavior ,of the users of the system and 
the internal operation of the system itself.* Students may be Interested in 
this logging feature since it is anticipated that the data gathered will be 
useful for many different types of anaU^ses. The data logged includes the 
identification of the user, the search request made, the amount of time spent 
in searching the file, etc. (see Figure l). 



*The computer code for this procedure was developed but was not fully 
operational at the time of writing this report . 







- 2 - 



PIG. 1: 



DATA COLLECTED BY CIMARON 



Fisld # Field Nsune Bytes 



1. 


LEECLEN 


2 


2. 


LTEEMiTO 


2 


3. 


LUNAJffi 


4 


4. 


IPNAME 


8 


5. 


LDATE 


4 


6, 


LINTIME 


4 


7. 


L3EQN0 


2 


8. 


LFLAGS 


2 


9. 


LBCODE 


2 


10. 


LDIACNT 


2 


11. 


LI3®CNT 


2 


12. 


LADXCNT 


2 


13. 


LTRKCNT 


2 


14. 


LSRCCNT 


3 


15. 


LREPCNT 


3 


l6. 


LQRTIM 


4 


17. 


LSRCTIMI 


4 


00 

H 


LSECTIM2 


4 


19. 


LRETTIMl 


4 


20. 


METTIM2 


4 


21. 


LOUTTIMl 


4 


22. 


IjQRYCEN 


2 


23. 


LQRTXT 


var . 
up to 

256 
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Length of this log record in hex 

Teradnal number In EBCDIC 

Initials of "user 

Name of program 

Julian date in packed format 

Time at entiy in packed foMiat 

Seq^uence nximber in packed format 

Bit flags indicating up to l6 
conditions 

Data-base code In EBCDIC 

Diagnostics count in packed format 

No. of index records read in hex 

No. of master addresses read from 
disk in hex 

No. of tracks read from disk in hex 

NtuBber of records reported after 
search in packed format 

Number of records retrieved by user 
in packed format 

Time at query entiy in packed format 

Time at start of search in packed 
forimt 

Time at end of search in packed 
format 

Time at start of retrieval in packed 
format 

Time at end of retrieval in packed 
format 

Time at exit in packed format 
Length of query in hex 
Text of the query 



CIMARON is a highly sophisticated program in its design and operation. 
However 3 some expertise also is req^uired of the user, since some of the 
program features are rudimentary as yet. For example, the user must know 
the format to use when submitting req^uests, since the present request 
negotiation phase of the program is limited only to testing the syntax of 
the request rather than otherwise assisting him in formulating his request. 



This document is intended both to serve as an adequate introduction 
to new users of the system and to provide a general description of the 
system to those with interest in hihliographic storage and retrieval systems. 
Subsequent chapters of this volume contain users* guides for these programs 
which specify the exact format of commands, sequence of operations, etc* 

The remaining sections of this chapter will deseribe CIMifflON and the file 
building programs at an intermediate level of detail. 

1.2 Searching and Browsing 

The CIMARON system may be conceptuallgied as having two distinct but 
complementary modes of operation: searching and browsing . The s earch mode 

has as its object the formulation of retrieval requests, the evaluation of 
requests against a master file of indexed records, and the retrieval of a 
relevant subset of records. The browsing mode has as its goal the examina- 
tion of the CIMARON index files , the extraction of appropriate index file 
entries, and the ultimate utilization of index file entries as components 
(i.e. terms) of search requests. 

The browsing mode is crucial in the CII^RON system because it helps the 
user maintain control of search operations. By browsing, the user can ascertain: 

• legitimate non-empty terms for search requests 

• variations in the representation of legitimate search request terms 

• exact foMiat (punctuation, abbreviationB , spelling, ete.) for search 
request terms 

• which terms will lead to overflow conditions in retrieval requests 

Browsing thus can be considered a part of the analysis which precedes 
actual searching and which is necessary to avoid inaccurate. Illegitimate, 
or inappropriate search request terms. 

TOiila the capability of browsing through index files was conceived as 
part of the original CIMARON design, it has not been Implemented as part 
of the Initial CIMARON code. However, this capability is provided through 
the use of Browser, an independent routine, originally coded to aid 
programmer debugging. The operation of Browser will not be discussed 
here, both because its operation Is straightforward and because it 
represents a temporary implementation of the CIMARON system. Its opera- 
ting instructions are provided following those of CIM^0N2. 

1.3 The Formulation of CIMARON Search Requests 



Because of the hardware and software resources available in the 
Information Processing Laboratory, CIMARON is able to operate in real-time. 



to commimicate ’bi-directionally with users , and to utilise the Sanders 
Cathode Bay Tube (CRT) terminal video screens to o'btaln the search 
specification and to format and display Its retrieval results. We use 
the general term "interactive mode" to cover all these aspects of request 
formiilation , real-time search, immediate display of retrieval results, and 
user— program communication. This "interactive mode" is distinguished from 
"hatch mode" processing "by the immediacy of the communication cycle. In 
CIMABONj the interactive mode is used both to select data base, search key 
file, search request formulation, format of retrieval display, and to control 
the viewing of the retrieved records. 

CIMARON begins with a descriptive summary of Its data bases and Search 
Key Files. The user is asked to select a data base (SD— San Diego, SC— Santa 
Cruz), and CIMARON then opens the appropriate files of master file records 
plus associated Search Key files. 

CIMAEOU then asks the user for a search request . A search request has 
a precise syntactic definition which is given in Figure 2 in Backus-Naiir 
Form (BNF) notation. The basic syntactic consonants of the search request 
are: 

a. Name of Search Key File (i.e,, AU, TI , StJ, AD) 

b. Specification of search key value (e.g., FREUD or PSYCHOANALYSIS) 

e. Boolean connectives between search teimis (e.g,, AND, OR, NOT) 

d. Punctuation [e.g., apostrophe j_ or slash _/ or reverse slash ^ 
or left paren ^ or right paren _)_] . 

Apostrophes are used as left and right brackets around a search term. A 
slash is used to separate the name of the Search Key file from the search 
key value. Reverse slashes are used to bracket hexadecimal values to be 
entered into the search key value. (This is iised to specify symbols which 
cannot be entered legitimately via the terminal keyboard, such as subfield 
delimiters \PA\ or an apostrophe within the search key value.) 



Examples of CIMARON search requests are: 



»AU/FREUD, SIGMUND' 



(All books authored oi co-authored 
by Sigmund Freud) 



'AU/FREUD, SIGMUND* OR 'SU/PREUD, (All books by or about Slgm-und Freud) 
SIGMUND* 

'SU/PSYCHOTIfflRAPY* AND ('AU/FEEUD* (All books dealing with Psychotherapy 
or *AU/JIMG* OR 'AU/ADLER* ) and written either by Freud, Jung, 

or Adler. ) 

Before executing a Search Request, CIMARON checks the Byntactic validity 
of the request and then rejects it if: (l) there are an uneven number of 

apostrophes; (2) the Search Key file is incorrectly specified; or (3) an 
unknown Boolean operator is used. The user is notified of the type of error 
and is given the option of eorrectlng tlie search request which will be checked 
again for validity. ‘v. 




to 



FIG. 2: CIMARON SEARCH REQUEST SYNTAX 

DEFINED IN BACiCUS-NAtJR FORM NOTATION 



1. 


<Search request> 


• 


• 

m 


" 


<boolex> 


2. 


<boolex> 


• 


■ 




<term> | <boolex>OR<tenn> 


3. 


<'fcarm> 


- 


• 




<factor> 1 <term>AND<factor> 


4. 


<factor> 


: 


• 




<operand> | NOT operand> 


5. 


<operand> 


% 




- 


<searoh key> | ( <boolex> ) 


6. 


<search key> 


• 


• 


- 


' <attrlb , code> | <string> ' 


7. 


<attri‘b, code> 


• 


* 




au|tiIsu|ad 


8. 


<string> 




m 




<alphanum> | \<hex>\ <String><alphanum> | <strin. 


9. 


<alphaxLUjn> 


• 


• 




ary string of EBCDIC characters excluding 
apostrophe and reverse slash* 


10. 


<hex> 




• 




any string of hexadecimal digits , comprised 



of legi'bima'te 2-character hexadecimal 
numbers, e.g., FO. 



Terminal Types 
OR AU 

AND TI 



NOT SU 

( AD 

) EBCDIC characters 

* Hexadecimal digits 

/ 



^^Note that if an apostrophe , ',1s to be included In the alphaniameric 
string, the hex representation for it must be provided. Otherxrise 
it defines the end of the <search key>. Reverse slash, must be 
treated similarly, 

O 
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The device used for user^rogram traJismission Is the terminal key- 
board, with the CRT screen serving as a visual copy of what is being 
formulated and transmitted. The progi’aor^user communication is via the 
CRT screen. Because there are no mechanical linkages In this system 
(except for typing on the terminal keyboard), the Interactive cycle is 
very rapid, usually on the order of less than two seconds. CIMARON may 
be in lOse at all three terminals at the same time, the small increase in 
delay time being due primarily to tying up the single telephonic link 
between the Laboratory and the Computer Center. 

1.4 CIMARON Search Logic 

When the user enters an acceptable search request, CIMARON begins 
operating according to search logic designed to maximize search efficiency. 
This attempt to maximize occurs at two levels. First, the search request 
is divided into components (corresponding to the terms of the search 
request), and the processing of these con^onents Is oi*dered so that the 
minimum evaluation of Boolean espressions is required. Second, the actual 
search operation is performed against the Access File (which is a sorted 
file), and Master File records are not examined at this stage of processing. 
The Access File contains the Search Key Values previously extracted hy the 
File Generation subsystem and pointers to the master records. Thus, 
exhaustive search of the master records need not he performed hy CIMARON . 

The Boolean search request is treated hy CIMARON as If it were an 
arithmetic es^resslon; that is, there is a precedence order for eval*iating 
expressions. That order is NOT, AND, OR. Further, all ej^resslons within 
parentheses are evaluated before those without. Any search request thus can 
be treated as a slnple binary tree. Each node of the tree is a Boolean 
operation with the tree leaves corresponding to the search terms in the 
search request es^ression. For example: 'AU/BACH* AND ( ’SU/SUITE’ OR *SU/ 

CANTATA') can be represented as follows: 



Note that without the parentheses, the expression 'AU/BACH' AND 'SU/SUITE' 
OR 'SU/CANTATA' would produce the following binary tree: 



AND 




BACH OR 



/ \ 



SUITE CANTATA 



AND CANTATA 





BACH SUITE 



Thus the search logic can be seen in the following way. Beginning 
with the lowest level of the tree structure, a search is conducted for the 



term In the left leaf of the node. The sear'ch consists of both examining 
the appropriate Search Key Access* file and retrle-ving a list of addresses 
of Master File records which satisfy the Search Key val'ue of the search 
req^uest term. This list is called Left -Lis t , and it is sorted into 
ascending order "by Master File location address. A similar operation is 
performed for the ri^t leaf of the node, and the resulting list of Master 
File addresses is sorted and stored in Right-List. 

Once these two lists have been generated, they are comhined into a 
third Result-List according to the Boolean logic specified at the node. 

If the node operator is OR, then Left-List and Right-List are additively 
combined, except that duplicate Master File addresses are combined into 
one entrjf. If the node operator is AND, then the Result-List consists 
only of Master File addresses that are present in both Left-List and Right- 
List (i.e., that ar e duplicated in both lists). 

1.5 CIMARON Retrieval Display 

Once CIMARON has completed execution of its search and retrieval 
logic and has created a final result-list of entries to be retrieved, it 
presents the requestor with a further set of options concerning the display 
of retrieved records. The user may specify briefly the number of records 
to retrieve and the format in which they should be displayed. After dis- 
play of the records has begun, he may move forward or backward in the record 
display, ask for hard copy of a displayed record, or terminate the display. 
At that point he may either Initiate another request or exit. 

Four CIMARON messages are possible with regard to retrieval results: 

a. NO RECORDS SATISFY REQUEST 

b. X30C RECORDS SATISFY REQUEST - HOW MANY ARE DESIRED? 

c. ALL EXCEPT XXX RECORDS SATISFY REQUEST— TYPE NONE OR XCEPT 
FOR THE EXCEPTIONS 

d. CONGRATLTLATIONS - YOUR REQUPIST SPECIFIES THE ENTIRE FIIE 

Message (a) indicates a zero result-list (no records match the search 
prescription and no records can be retrieved). The user is then allowed to 
exit or to reformulate his request by either altering the current request 
or subinltting an entirely new request. The same options are offered to 
the user after message (d) which indicates a failure of a different t 3 rpe , 
namely that the search result encompasses the entire file (for example: 

•AU BACH' OR NOT 'AU/BACH'). Message (b) is a more conimon result and gives 
the student an idea of how many records are potential candidates for 
retrieval and display. Message (c) Is similar, but the class of retrieval 
records in this case Is the negation (or exception to) the search request. 
For exai^le , NOT 'AU/BACH', If taken literally, would result in speciiying 
the entire file less those few titles authored by BACH. It is presumed 
that the student will work to display the exceptions, namely NOT (NOT 'AU/ 
BACH'). The response to mesaagea (b) and (c) can specify 



*Recall that the Access File consists of Search Key values and linkages to 

the addi-esses of Master File records. 

O 
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NONE 



(null response is the same as ALL) 

(no records to he displayed) 

ALL (all retrieval candidates to be displayed) 

XXX (a three digit number, indicating ho>r many records to display) 

Any spejPl elation (except NONE) can he overridden during the display to 
ter mi nate the display process. 

When the user indicates the number of records to he displayed, the 
record display is Initiated with a format default option (USER format) 
which presents all dlsplayahle data in the master record. This default 
option can he overridden at any time. Currently, two format options are 
available: (l) machine- form MARC II specified as MARC, and (2) user’s MARC 

II specified as USER. The first option displays records in their origin^ 
machine format-, l.e,, record leader, record directory, and data fields, in 
that order. This format is useful for students who are iiiterested in 
analyzing the coB^onents of the machine-form MARC record. CIMARON is 
structured to accept other format display routines in the future. 

The user format available Is very similar to a normal catalog card, 
except that the major (MARC- defined) data fields each are printed on a 
separate line.* Each line begins with a short miamonic identifying the 
contents of the line. These identifying codes are: 

EEC (aceession number, publication date and call number) ’ 

MEH (main ent3:y heading) 

TIT (title) 

IMP ( imprint ) 

PAG (collation statement) 

SER (series notes) 

NOTE (other notes) 

SUB (subject tracings) 

OTH (author, title or series tracings) 

A typical record under this format appears as Figure 3. 

Records are displayed in disc addreas order and, therefore , are. not 
alphabetized. At the end of each record display, the student has the 
following options: create a hard-copy version of the record currently 

dlsplayedi to continue to the next record in the display sequence; skip 
forward or baclcward in the sequence; kill the aisplay and return to the 
request formulation stage; or change display format. 

mimb p-r, publication date and call number all are printed on the 



O first line, in that order. 

FRir 




F'T^t. 3: CimBON DISPLAY RECOED EXAiffLE 



HEC; 


000123^ 1965 




BL1032 Bt 


MEH: 


BELLAH, ROBERT NEELLY, 1927 






TIT: 


RELIGION AND PROGRESS IN MODERN 


ASIA. 


EDITED BY ROBERT N. BELLAH 


IMP: 


NEW YORK, PBEE PRESS, I965 






PAG: 


XXV, 2k6 P. , 22 CM. 






NOTE: 


REPORTS OF A CONFERENCE HELD IN MANILA 
OP THE CONGRESS FOR CULTURAL FREEDOM 


IN 1963 UNDER THE AUSPICES 


SUB: 


ASIA— RELIGION 






SUB: 


RELIGION AND SOCIOLOGY 






SUB: 


ECONOMIC DEVELOPMINT 






CTII: 


CONGISESS FOR CULTURAL FREEDOM 







PIG. U: CIMARON COMMANDS AVAILAmil DURING RECORD DISPLAY PHASE 



— -- (null response, display next reGord) 

Kiia. (go to request formulation/exit 'branch | return to the display 
can still he accomplished through command Bn) 

Bn (display the nth record hack in the list; n^O, or n> current 

position provides the first record) 

SKn (skip n records and display record following; neO , or n>remalnlng 
records provides the last record) 

HARD (send a copy of this record to the printer file, no change 
in di splay) 

MARC (display current and following records in MARC II communications 
foimat ) 

USER (display current and following records in user format) 
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FIG. 5: GENERAL GIMARON COMMANDS 



Commands 




Program Phase 








1 


2 


3 


4 


5 




Data Base 


Bequest 


RasiJ=ts 


End Results 


File 




Selection 


Foraaulatlon 


Display 


Display 


Closing 


Finis 


- 


- 




- 


Terminate 

CIMARON 


Exit 


- 


- 


- 


Go to 
Phase 5 


- 


Reopen 




- 


-* 


- 


Go to 
Phase 1 


SD 


San Diago 


- 


- 




- 


SC 


Santa Cruz 


- 




- 


- 


(null) 


- 


“ 


Show next 
record 


Go to 
Phase 2 


TeMiinate 

CIMABOH 
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1,6 File Generation and Organization 



Before CIMARON can "be used, there must he a data base created on 
vhich it can operate. Thus, a sequence of programs is required to 
transform a random collection of machine- form bibliographic records into 
well-organized files which are easily searchable by CIMARON. These 
programs do not operate Interactively nor do they provide real-time response 
capability. Rather, they operate in a batch-mode Job stream, and the major 
successive steps are: 

a. loading the basic Master Pile records into disc memoiy (PILOR) 

b. extracting indicated Search Keys from each individual record 
( ZODIAC ) 

c. special search key generation (e.g., DOLBY) 

d. sorting the collection of Sea^rch Keys (OS Sort) 

e- consolidating Search Keys Into a linked file structure (PAX) 

The first step in this Joh stream is to construct on disc (at present , 
the IBM 23lU) a linear array of the master file of Input recorde. No 
ordering or pre-sorting of the Input tape is required. The program reads 
in streams of variable length records, blocked or unblocked, and develops 
disc storage algorithms based upon the four— byte bina3:y record descriptor 
word (RDW) in the standard IBM location. Possible input sources would be 
either magnetic tape or a sequential file on disc. 

This program, called FILOR, creates two output files. Tine first is 
the Master File loaded sequentially onto the disc. These records are packed 
to the highest density possible; that is, they are blocked to the full 231^ 
track capacity of 729^ bytes. Because of this dense packing, soma racords 
may be split across two disc tracks. Although this increases programming 
complexities in the record display portion of CIMARON, the net ravings in 
storage space is considerable due to the large size of bibliogru,p^“ n records. 

The second file created by FILOR is a sequential list of tue di&-" 
addresses of each Master File record. The data in ea h record of this "^'Inder 
File consists of four elements: 

a. Master File record mmber ( sequentially assigned) 

b, 6 ±bo track number 

e. relative position of the master file record within the disc 
track (this is termed the offset) 

d* length of the master file record 

There is one record in the Finder File for each Master File record. 

The data in the Finder File Is used in subsequent programs to retrieve 
the records in the Master File, 

The second step in the file organization process of CIM^ON. is to 
analyse each Master Pile record and to extract Search Keys from each record. 




This process is performed hy ZODIAC, which is organized to process MARC II 
structured records.* ** The definition of what constitutes a Search Key is 
controlled "by parameter cards which are stated in terms of a set of MARC II 
major field tags. An example would he the generation of author (AU) Search 
Keys, consisting of: 



100 


Personal Name 


Main Entry 


110 


Coi^orate 


Main Entiy 


111 


Conference 


Main Entry 


130 


Uniform Title 


Main Enti^ 


TOO 


Personal Name 


Added Entry 


Tio 


Corporate 


Added Entry 


Til 


Conference 


Added Entry 


T30 


Unlforai Title 


Added Entry 



The definition of the Search Key also could he narrowed, for example, hy 
the exclusion of Corporate, Conference, and Uniform Title headlngB, or hy 
the exclusion of Added Entries. Similarly, the definition could he expanded 
hy adding Series Headings (400, 4l0, 4ll, 800, 8l0, 811 ) and/or Subject 
Tracings for Personal, Corporate, Conference, or Uniform Title names (600, 

610 , 6 ll, 630 ). 

The extraction of Search Keys occurs hy using as input the two files 
generated in the preTrious pass j l.e., the disc-stored Master Pile and the 
Finder File. The Finder File record and its corresponding dlsG-stored 
Master File record are processed together. Each Master Pile record is 
analyzed for the existence of one or more Search Keys. For each Search Key 
found, a fixed-length output record is generated eonslsting of: 

a. Search Key (all upper case) 

h. Field Tag which caused the generation of Search Key 

e. disc track mmiher of Master File record 

d. offset of record location within specified disc track 

e. length of Master File record 

A Master File record of course may have more than one Search Key, in which 
case multiple output records are generated. 

ZODIAC will accept definitions of up to twelve Search Key files , and 
will, therefore, generate in a single pass Search Key files for author, 
title, subject, etc. CTirrently there are two constraints to the program. 
First, if several Search Key files are defined in a single ZODIAC program 
pass, the field tags which con 5 )rlse a Search Key file must he unique and 
mutually exeluslve. Thus, if one were creating two separate Search Key files, 
one for Author (AU) and one for Subject (SU), the 600 field (Personal Name 
Subject Tracing) could he allocated to one hut not both of these files on a 
single ZODIAC run. To achieve placement in both, a second run of ZODIAC must 
he made with different parameter cards. The second constraint concerns the 



*Lihrary of Congress , Suhscrlher’s Guide to the MARC Distribution Service , 

Washington, D.C,; Information Systems Office, 19T0, p, S6, 
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ability to access subfielde. In some situations 5 it ■w'ould be useful to 
exclude certain subfields from the Search Key (e.g, ^ $e relator)^ or to 
construct Search Keys from subfields altogether (e.g.^ a Key file consisting 
only of geographic subject heading subdivisions)# Ciorrently, the level of 
Search Key definition does not extend beyond the tagged field level. 

Utilizing the output file of ZODIAC g other routines can be applied to 
generate ’-special” Search Keys. One such routine, DOLBY, provides the 
ability to generate Search Key files adapted to ’’noisy” or xmcertain search 
specifications. The Search Key file, called Author-^Dolby is obtained 

by scanning the output from ZODIAC and reducing the author surname to a 
canonical or (^uasi-phonetlc representation# In this form all vowels are 
eliminated from the surname, and phonetically related consonants are reduced 
to single forms. This Search Key file (JW) may be used to produce a set of 
candidate retrievals when the user is uncertain of the pronunciation and/or 
orthography of an author search# A second ’’noisy” Search Key file generator 
has been programmed (but not yet Implemented) for operation on the title 
field. The resulting Search Key •^^.le will contain a permuted sequence of 
information bearing content wore;:. 10 that title requests can be processed 
affectively even where the searc omits an article or prapoeitlon or 
confv^eB the order of the words in a title. 

After the Search Key records have been generated, they are sorted using ' 

an IBM utility sort routine. The order of sort is: File Name (AU, SU, TI, ; 

etc#); Search Key, block/track number, offset- These sorted files are then 
preBented to PAX, the routine which constructs the random access linked file ; 

structure. The standard access arrangement for indexed files, which is I 

provided by IBM, has had the dual requirement that the Search Key be imique f 

(no two records with the same key value) and that the records be the same j 

length# Therefore, PAX analyzes the input records for duplicate key values 
and procasses all those with the same value as a string. A Search Key string, i 

therefore, is defined as one or more occurrences of the same value of the I 

Search K^# i 

s 

An example from a hypothetical AU file follows : ^ 



Master File Record Addi:"eBs 



Search Key Value 


Track 


Offset 


Length 




i 

1 


BACH, 


ClffilSTIAN 


51 


1343 


600 


String 


1 ! 


BACH, 


JOEAUN SIBASTIAIT 




545 


712I 


1 


{ 


BACH, 


JOHAMN SEBASTIAN 


i4 


2506 


455 


^ string 


2 j 


BACH, 


JOHAEN SEBASTIAN 


30 


1025 


512J 


I 




BACH, 

BACH, 


KARL PHILIPP IMANUIL 
KARL PHILIPP EMANUEL 


8 

13 


T210 

150 


650 

475^ 


> String 


3 


BACH, 


WIL^LM FRIIDEMANN 




4570 


580 


String 


4 



If a string consists of only one record, then the author is represented by 
only one title in the file. A string also may consist of many Search Key 
records, indicating multiple entries for an aiithor in the Master File. In 
the example above, strings one and fowr are eingle-record strings, whereas 
strings two and three are multi -re cord. 
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Following this analysis, PAX creates two new files: a Key Access file, 
and a Key Locator file. There are two possible linkages from these files to 
the Master File, as follow: 



SB File 

er Filel 

Used where only one 
master record has the 
key value , 




or 




Used where more than 
one master record has 
the same key value. 



The siii^ler link occurs for each slngle^record string in the sorted Search 
Kay file. In that case there is only one Master File Record In the data 
base corresponding to the Search Kay value in the Key Access file record, 
Consaq.uently , there is only one Master File record address and that may be 
carried directly in the Key Access record. The second case occurs for each 
multi-record string in the sorted Search Key file. In that case the Key 
Access file carries the Search Key value, but not the full set of Master File 
Record addraseaB, These addresses are stored Instead in eequential fixed 
length records of the Key Locator file* The address in the Key Access file 
is thus a pointer to a sequential string of records in a second-level Locator 
file, each of which in turn points to a Master File record. Thus, to retxarn 
to our example of miilti--reoord strings : 



Key Access 
File 



Key Locator 
File 




Thus consolidation occurs as a result of carrying each milque Search Key 
value only once , th^ reducing all multi-record strings to single-record 
strings. In order to maintain a fixed-length record structure In the Key 
A.ecess file, Master File record addresses are transferred to a separate 
vKey Locator) file. This two or three level linked file structure is the 
final outcome of the file construction process and represents the data 
base utilized by the search logic and retrieval portions of the GIMARON 
program. 
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2. OIMARON2 TEitMINAL OPERATOR’S GUIDE 



2 . 1 Overview 

This document is a terminal operator’s guide to CIMAE0N2 - the 
latest version of an on-line search and retrieval facility, implemented 
at the Institute of Llhrary Research, Berkeley. CIMAR0ir2 allows the 
user to enter search requests ( in a Boolean language) involving authors, 
titles, and subjects, from a remote terminal and subsequently presents 
the search results at the same terminal. 

The hardware facility includes three SAUCERS 720 character display 
terminals at the Information Processing Laboratory in the Library School 
and an IBM 360/40 computer with a 2314 disk storage unit at the Campus 
Computer Center. The software facility was developed under a germinal 
Monitor ^stem (TMS) designed specially for the Institute’s on-line 
computing needs. 

Kie guide is written with a view to conducting the reader through 
a complete session with CIMAR0N2 at the terminal. All text appearing 
on the display screen, whether generated Internally by programs or 
keyed in by the user, is shown throughout this document In upper-case 
letters . 



2.1.1 CIMAR0M2 Structure and Coimnands 



In order to provide proper control and transfer between the program, 
components, CIMAROK and its command structure are divided into six 
phases. During each phase of the program, the user may choose from avail- 
able options in order to transfer to other phases of the program, to sub- 
mit a request, view the results , etc. Figure 6 lists thes" program 
phases and the legitimate transfers between them. Figure 7 shows the 
general commands available to the user during these phases. And finally 
Figure 8 shows the commands available during the results display phase 
to assist the user in viewing the retrieval result. 



FIG. 6: PHASES OF .CIMAHON OPERATION AUD LEGITIMATE TRANSFER PATHS 



Phase § 
1 
2 

3 

4 

5 

6 



Phase T^pe 

Data Base Selection 

Request Formulation 

End Search 

Start Display 

End Display 

File Closing 



Next Phase(s) 
2 

2 , 3 , 6 
2 , 6 
5 

2, 4, 6 

1, Terminate 
CIMARON 
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FIG. Ji GENERAL CITRON COM^mPS 



P3?og3T am Pha^ses — — — > 
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Commands ^ 


1 

Data Base 
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2 

Request 

Pormu— 

lation 


3 

End 

Search*' 


4 

Start 
^ Results 
Display 


5 

End 

Results 

Display 


6 

File 

Closing 


//CLOSE 


- 


Go to 
Phase 6 


See 

CLOSE 


- 


see 

CLOSE 


- 


EXIT 


“ 




” 


- 


- 


Terminate 

CIMARON 


REOPEN 


- 


- 




- 


- 


Go to 
Phas e 2 


SD 


San Diego 


- 


- 


- 


- 


- 


SC 


Santa Cruz 


- 




- 


“ 


- 


(nullf** 


Santa Cruz 


Search 

same 

request 


Go to 
Phase 2 


Show 

all 

records 


Go to 
Phase 2 


Terminata 

CIMARON 


CLOSE 


- 




Go to 
Phase 6 


-- 


Go to 
Phase 6 


- 


RESTART 


- 


- 


Go to 
Phase 2 


“ 


Go to 
Phase 2 


- 


EDIT 


- 


- 


Go to 
Phase 2 


- 


Go to 
Phas a 2 


- 



^Lash indl cates conmand \mairailable In tliis pliase. 

^^End searcli is tlie phase arrived at if" the search. ■■falled”| Phase h is 
arrived at if the search "s-ueceeded* ” 

Indicates depression of SEND BLOCK key i^th ho typed response. 
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CIMAHON COMMANDS AVAILABI^ DURING RESULTS DISPLAY PHASE 



Avs-ilaUla 

ConiTnands ^ Meaning 



( mill ) 



Display next record 



KILL Go to Phase 5^ End Results Display | 

return to the display can still "be 
accomplished through command Bn 



BACK ^ Display the nth record back in the lists 

n=0 5 or n>cur'rent position provides the 
first record I B and B1 are the same 



SKIP Skip n records and display record follow- 

ing I n— 0 5 or n>remaining records provides 
the last record 



HARD 

USER 



Bend a copy of this record to the printer 
file 5 no change in display 

Display current and followiiig records 
in MARC II communications format 



Display current and following records 
in user format 



^Underlined letters are the minimum typed characters to uniquely 
define the command p Additional characters are optional. 



I 

i 
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2,1.2 Use of -the Sanders 720 Keyboards 



The Sanders 720 keyboard consis'bs of bwo major groups of keys — bhe 
alphan-umeric group containing upper— case letters § digits , and punctuation 
characters and the fiinGtion key group on the ri^t hand side of the key- 
board, Of the former group, the user is not concerned currently ^ith the 
HOI® 5 horizontal and vertical TAB and CR k^s , because all formatting is 
usually controlled by the programs, and input is expected to be a pure 

alphanumeric string. Of the latter group, the user is concerned with the 
IR3ERT, DELETE, and SEED BLOCK keys. The IHSERT and DELETE (blue keys) in 
conjunction with the cursor positioning key (i,e,, the SPACE key), enable 
the user to edit his input string before dispatching it with the SEND BLOCK 
function key. 

Any message from within the program requires a user response. This 
may be a simple acknOTled^ent on his part that he has read what Is on 
the screen, or it may be a command word or a search request. In any ease 
the following can be noted: 

a. The cursor is positioned one or two lines below the last line 
of the program message, often Just beyond a right arrow (>) and blinks 
steadily, 

b* The user types in his input here, edits it if necassaxy, and 
dispatches it by pressing the SEND BLOCK function key. 

c. Both the message and the iiser's input are visible on the screen 
lontil the next message appears. 

d. Sometimes 5 when all three terminals are being uised, the user’s 
response may go into a blink mode. This Visually does not last more than 
a few seconds and means the input is being queued before it is processed. 

Also, whenever CIMAR0N2 pron^ts the user to select one of many 
command options , the first alternative listed is usually a default 
option which may be taken merely by pressing the SEND BLOCK key. This 
causes a zero— length message to be sent to CIMAR0N2 as a signal for the 
default command. A simple SEND BLOCK (l.e., a zero- length message) is 
used elsewhere as an acknowledgment from the user. This is the same 
convention if viewed as a default command to "continue.’* This procediire 
allows the user to proceed rapidly along the most commonly used progrrai 
paths . 

2,2 Entering CIMAR0N2 

CIMAR0N2 be called up on the terminal as a user service under 

the germinal Monitor System. Thin means that Tl® brings Into core a 
copy of the CIN^R0N2 machine code stored in a program library on disk and 
initializes the communication path between CIMAR0N2 and the appropriate 
terminal. Since the program is reentrant, only one copy of it is in core 
at any time, although comauni cation paths may have been established from 
Cli^fflONS to more than one terminal. Thm detailed procedure for entering 
C1N^0N2 Is described below. 




2.2.1 Logging In 



Wlicn has been Initiateds the following messages appear on each 

screeni 

Tl^lOOI - IN OPERATION 

TMSIOIA - WAITING FOR LOG N 

The normal response to this is to type in either GPOl or OP02 and 
send the message (depress the SEND BLOCK button). The initials of some 
ILR personnel are also valid, but these are usually entered by those persons 
when conducting debugging sessions or when running machine tutorial programs 
d.e,, discus). If one of the valid initials is received (say GPOl), IMS 
responds with: 

T]yiS102I -= GPOl LOGGED IN 
TMSIOUA - SPECIF! PROGRAM 

The terminal user^ has been logged in successfully and may now call 
CIMAR0N2 by typing CIMAR0N2 followed by a SEND BLOCK. This results in 
entany to the program and the display of a ■■ title-page” message. 

2.2.2 Selecting the Data-Base 

The initial ”titla-page” message lists the version of the program, 
the date this version was first Dperatlonal, the data bases currently 
stored and indexed on disk, and the attributes via which the data base 
master files can be searched. The last line rac^uasts the user to select 
the data base, which he does by typing a two letter code. Currently, 
there are two data-bases, and the codes are SC (for Santa Cruz University 
Catalog) and SD (for San Diego Biomed Catalog). The Santa Cruz data-base 
will be opened by default . The selection of the data-base at this point 
ensures that all subsequently opened index (attribute) files will pertain 
to the correct data-base. -Hie code letters typed by the user will appear 
in the top left-hand coTOer of the screen (HOME position). If the code 
letters are invalid, the message SELDB will reappear, allowing the user to 
try again. 

2.3 Entering Search Requests 

Search requests are entered when the following prompt appears on the 
top line of the screen: 

CIMARON IS READY - ENTER BOOLEAN EXPRESSION; 

The cursor is positioned two lines below this message, and the user 
may type in the expression immediately- As always, user input is dispatched 
to the computer by a SEND BLOCK at the end of the input. If the input string 
typed by the user is detected to be empty (zero length), the above message 
(mSGI) reappears, ain,owing the user to try again. 

Three other options are provided here. One is to allow the user to 




■type //CLOSE 5 inB*tead ot a seaarch request- Section 2.6 e^lains what 
happens when CIMAE 0 W 2 receives this commajid. The second is to allow 
Goiranents to be entered in the system log. A comment must begin with an 
asterisk and may be up to 256 characters long; conmaent strings longer than 
256 characters will be truncated. ilie third option arises when //SPEC causes 
a program interrupt end tremsf*ers to the interrupt handler- This option is 
f^or the use oT programmer s only- 

2.3*1 Search Keys and Attributes 



The minimal form of an input request is a search key * A search key is 
enclosed in single quotes and the first two characters identify the type of 
attribute (associated with master file records) that the search key 
represents . E. g. ^ 



’AU/KIHSEY' 



•TI/THE HISTOEY OF MEDICINE’ 

These represent search keys of the author and title variety reepec^ 
tively- The attribute codes are AU and TI^ and the slash following the code 
is a mandatory delimiter. Currently , the legal attribute codes are AU^ TI, 
SU5 AD* The last two represent sub J act and ** Dolbylzed” author . Note that 
every character between the slash and the ending quote is significant ^ 
including blanks^ and is used in direct con^arison with keys In index files * 
In actuality 5 an inclusive Boolean OR is performed between the associated 
master records of every key in the index file whose first part matches 
(character for character) the submitted search key. This will henceforth 
be referred to as the part-key facility. At present ^ all user— submitted 
search keys are part-keys beoause they are truncated to 32 characters 5 
(including the attribate code end delinnter), before they are entered in 
tables^ whereas the index files have key lengths of 40 or more- 

It is iirportant to realise, however, that in order to get a unique 
match on a single record (key) in the index file, it is necessai:y onl^ to 
enter a sufficiently long part-key by consulting the index file listing.* 

An additional facility provided in this respect is defining hexadecimal 
strings within the search key. A hexadecimal string is identified by the 
fact that it is enclosed in reverse slashes. E-g., 

’SU/NURSING\FAAT\OBSTETRIC ’ 



is a subject search key containing the hexadecimeuL string FAAT (a MARC II 
subfield delimiter followed by a lox-zer-case x) , TOils device is used 
whenever characters to be included In a search key are not to be found 
on the terminal keyboard, and gives access to the entire 8 -bit character set. 



*Also known as authority listings. These are essentially printouts of 

the author, title, and subject index files showing the keys and the 
number of master records indexed imder that key. 
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2.3.2 The AD Attribute 



The letters AD represent the ’'Dolbyizad” authors attribute* Search 
keys preGeded by this code are routed to a special index file in which 
the keys are canonical forms of author names , The canonical form is 
obtained by applying a series of transformations (rules) to an author 
name. These rules , first proposed by Dolby 5 * are designed to smooth out 
differences due to minor spelling variations (or errors) in English sur— 
names 5 and reduce the variants to a canonical form. Thus by applying 
the algorithm to a search key and then looking for hits in an index file 
of canonical names g one obtains noisy match ^ i.e. , one achieves greater 
recall at the expense of precision* 

The ”Dolbyized-’ author index files contain not Just author names 
(in canonical form) but also associate authors ^ editors , translators g 
etc. This further Improves recall* 

For example g searching the Ban Diego Biomedical file vith the key 
’AD/KINBEY* results in retrieving 10 records of which 

2 were authored by KINSEY 

1 was about KINSEY 

h were authored by KUNTZ 

1 was authored by KOONTZ 

1 was authored by KINSMAN 

1 was authored by lOTNSTADTER 

In the last two names g only the first syllable is ”close’' to KINSEY. 
This is because the part-key feature was automatically Invoked ajFter 
the key typed by the user was "Dolbyised.” In order to suppress the 
part— key search on the Dolby file g the user may add a blank to the end 
of the key thus: ' AD/KINSEY^ ’ * In this caseg the last two names, viz. g 

KINSMAN and KUNST^TER, no longer appear in the retrieved records* 

2 . 3.3 Search Req^uests in the Form of Boolean Expressions 

So far g search req^ues ts composed of a single search key have been 
described* However g CIMAR0N2 accepts more complex requests in the form 
of parenthesized Boolean expressions wherein search keys with different 
attributes can be mixed freely*. For example g a more complex request 
would be: 

*SU/0PTICS^ AND (*TI/FIBER OPTICS IN STJRGERY OF THE EYE* 

OR *AD/K00NS') AND NOT (’SU/LENSES* OR * SU/ TECHNOLOGY ’ ) 



^Dolbyg James L. , "ihi Algorithm for Noisy Matches in Catalog Searching 
in Cunninghanig Jay L* et al. , A Study of the Organizatioji and Sear oh of 
Bibliographic Holdings Records In On-Line Compeer Syst ems: Phase Ig 
Berkeley: Institute of Libr^y Research g University of California, March 

1969. 
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This expression has five search keys ^ three attrihute types - SU^ 
TI 5 and AD, and iiaes all three Boolean operators - AND, OR, NOT. The 
operators haire an implied precedence as f*olloT 7 s: 



NOT 


highest 


AND 


next highest 


OR 


lowest 



The NOT is a imai^ operator and associated with the search key or sub- 
expression immediately to its rl^t. The AND and OR are binary operators 
having a left-operand and a right -operand, each of which may be a search 
key or sub-expression* Between similar operators the implied precedence 
is left- to- right* Parentheses are i^ed to define explicitly a group of 
operations of higher precedence i the hi^er the nesting level, the hi^er 
the precedence. 

There are two important limitations on the complexity of the requests 
the user may type in: 

a. The maximum nuBiber of search keys allowed in the expression is 
sixteen (l6). 

b. The maximum length of the search request. Including all blanks , 
is 256 characters. 

Since a line of the CRT screen accommodates 84 characters, long expressions 
have to be typed over more than one line. On typing beyond one line, the 
cursor automatically returns to the beginning of the next line, so no carriage 
control functions are required. On the other hand, by controlling the oursor 
position and using the INSERT and DE]^TE function keys, local editing of 
the serach request may be performed before dispatching it with a SEND BLOCK. 

2.4 Search Results 

There are many ways in which a searcii may end. It may end in a 
diagnostic due to an incorrect construction of the sear^ch request or due 
to storage overflow | it may result in sporting certain imu^ual results, 
or it may report that a finite set of records satisfieEi the user*s request. 
These caxises next are explained individually . 

2,4,1 Dia^oBtics 

CIMAR0N2 currently provides ten different diagnostics when something 
goes wrong either in the analysis of the input request or in the search. 

These are numbered DM0 throu^ DM9. Of these, DM0 throu^ DM6 report 
incorrect constructions, syntax errors, etc, in the input ej^ression; 

DMT and DM 8 report file search failures; and DM9 is a special warning 
indicating partial search failure due to a ’storage block overflow being 
detected during the Bearch. More will be said about DM9 in the next^ section. 
As an example, suppose the user entered the following search request: 



'AU/SMITH' AND 'AU/ JONES 



with a missing ending apostrophe | CIMAR0N2 Immediately responds with 
diagnostic message 3 (DM3) which appears at the top of the screen as 
follows : 

UNBALANCED APOSTROPHES IN THE EXPRESSION - EDIT: 

The improperly constructed expression is displayed 2 lines helow this 
message. Once again, the user may employ the INSERT and DELETE function 
keys in conjunction with the cursor position control key (SPACE) to 
edit the message and send it. In the current example, adding an apostrophe 
after the S in JONES will correct the expression successfully and result 
in a search. 

2.4.2 The List Overflow ¥arning 

This Is an interim feature in the program to detect and report the 
overflow of any of the internal record lists during the progress of the 
search. When Gorrective action is incorporated in C1MAE0N2 for this 
condition, the warning message (DM9) will be removed and the user will 
be unaware of the condition. An example of the appearance of DM9 Is' 
given helow. An overflow condition is detected during the search for 
’AU/TOYNBEE’ in the Santa Cruz data base when the hundredth record in- 
dexed under this key is read , and DM9 appears thus : 

^WARNING* - LIST OVERFLOW DETECTED, CUTOFF OCCURRED AT THE 

FOLLOWING KEY : 

AU/TOYNBBE, AiWOLD JAMES, Dl8l9- 

+ SEND BLOCK TO PROCEED + 

The middle line shows the last key read from the author index file 
before overflow was detect:, the last line indicates. CIMAR0N2 is, awaiting 
the user's acknowledgment. If the letters 'KKKK' appear Just after 
the slash In the middle line , it means the last key read has no special 
significance since the overflow was detected diiring a Boolean OR operation. 

When the user acknowledges this message by pressing SEND BLOCK, 
CIMAR0N2 proceeds to report the result of the search. The result will 
not reflect the true contents c” the data base since some part of the 
search was prematurely ended, DM9 is the only diagnostic followed hy 
a report of search results . 

2.4.3 Unusual Results 

Since search requests are in BoolesJi form and the concept of negation 
(using the NQT operator) is included, there are three types of "unusual" 
results that may occur. These are: 
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2. 4. 3.1 None 



None of “bhe records in the data-hase file meet the conditions of 
the search reqLuest* This will occiir in search requests specifying a 
conjimction of two miitually exclusive (record) sets ^ e.g,^ 

’AU/ CHURCHILL 5 WINSTON’ AND ^AU/HITLER, ADOLF’ 

CIMAR0N2 will report this search as follows : 

NO RECORDS SATISFY REQl^ST (MSC2) 

NO RETRIEVAL ( MSGT ) 

OPTIONS ARE I EDIT^ RESTART, CLOSE (MSGU) 

The user Is forced to hypass the record retrieval routines, Thm 
default reeponse is EDIT, which results in the following message;; 

THE ^^T SEARCH REQUEST WAS (MSGE) 

Below this is redisplayed an exact copy of the last search request which 
the user may edit using the ftmction keys. If he wants to repeat the 
Siame search, he takes the daf atilt option hy pressing SEND BLOCK. ^START 
causes MSGl to reappear on the screen and allows the user to submit a new 
search request. EXIT processing is described in Section 2.6* 

2. 4.3.2 All 

All the records In the data base file are defined by the search 
request. This occurs whenever a search request specifies a disjunction 
of two c omp lement ary sets. For example, 

’SU/HISTORY’ OR NOT ’SU/HISTORY’ 

would result In the following messages : 

CONGRATULATIONS-YOim REQUEST SPECIFIES THE ENTIRE FILE (MSG3) 

NO RETRIEVAL (MSGT) 

OPTIONS AREi EDIT, RESTART, CLOSE (MSG4) 

Again the user is not given the option of retrieving records . 

2. 4. 3. 3 ’All but’ 

The last ’’unusual” case is one In which the search request specifies 
all but a small portion of the data base. This occurs for negated re- 
quests of the type i 

e . g . , 1 NOT » AU/TOYNBIB , AHNOLD * 

e.g., 2 NOT ( ’SU/PSYCHIAl’RY’ OR ’ SU/FSYCHOLOGY ' ) 



For ‘these, the search results would he reported thus: 



ALL EXCEPT XXX RECORDS SATISFY REQUEST 
HOW MARY EXCEPTIONS ARE DESIRED? 



(MSG6) 



Here, XXX represents a 3-dlglt number. Tdie user, for obvious reasons, is 
permitted only to retrieve the exceptions . The allowable responses to 
this message are the same as those for MSQ5 whiclj, appears after searches 
ending in normal results. (see Section 2.U.U) 



2 . 4 . U Normal Results 

In the usual case, the search ends by accumulating a finite number 
of records representing a small subset of the data-base. This is re- 
ported by CIMAR0N2 as follows : 



X3Q5; RECORDS SATISFY RBQUBfST 
HOW MANY ARE DESIRED? 



} 



(MSG5) 



Once again, X30C stands for a 3-digit number denoting the search 
count. There are four allowable responses to this message snd MSGo 
(Section 2. 4. 3. 3), namely: ALL, NONE, a number, or //FBMT, 

ALL is the default and is taken to mean that the user wishes to look 
at all the records reported In the search count. It results in entry to 
the record retrieval routines (see Section 2.5) • 

NONE means the user wishes to bypass the retrieval routines. It 
results In the following messages : 



END OP RETRIEVAL 
000 RECORDS DISPLAYED 

OPTIONS ARE EDIT, RESTART, CLOSE, BACKUP 



(MSG8) 

(MSG4) 



BACKUP is one of the list control commands, and its use is explained 
in Section 2.5.1. 

The user also may type a numher less than or equal to the searah 
count reported in MSG5 or MSG6. Appropriate validity checking is per- 
formed and a number greater than the search count is taken to mean ALL 
whereas any number evabluatlng to zero is equivalent to NONE. 

PRMT is a command word indicating the user's desire to design the 
record display format before retrieval. Currently, the user Is limited 
to a choice between two fixed formats. The following message appears 
when the user types //FRMT in response to MSG5 or MSG6 ; 

SELECT RECORD DISPLAY FORMAT: 'MARC2' OR 'USER' (MSG9) 

On typing either (MARC2 is the default) MSG5 or MSG6 (as the case 
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may "be) reappears, and either the number ot records to be retrieved may be 
indicated^ or the format may be changed once again. 

If 3 after checking for the format command, GIMAIR0N2 fails to find ALL, 
HOWE or a number in responBe to MBG5 or MSG6, the foUow'ing pron^t appears: 



Note that all the cases described above in Sections 2,U^3 and 2,U.U 
may be preceded by the warning diagnoBtie (DM9) s described in 2,pU.2 in 
which case the reported search results are only partially right, 

2,5 Record Retrieval 

Several controls are provided to the user over the display of master 
file records* The first is selection of the number of records to be dis- 
played. This is indicated either by responding ALL (the default response) 
to MBG5 or MSG6, or by typing a number less than or equal to the coimt 
reported by CIMAR0W2 in MSG5 or MSG6. The ALL response results in se- 
quential retrieval of every master file record addressed by the result 
list (negated or not), in ascending order of disk address. Search results 
normally are displayed In this order unlees the list control conmands are 
employed to force a different sequence, 

2,5*1 The List Control Commands 

The List Control Cormnands provide the user with the ability to move 
freely forward or backward along the list of master records to be dls-- 
played, to change the display format between records, and to terminate 
retrieval after any record. At the end of any record display the allowable 
commands are ; 

K (for ^ill) to stop fTirther display of records 

B (for jackup) to display the record previous to the current one 

SK (for Skip) to display the lecord after the next one 

M (for Marc) to redisplay the current record in MARC format 

U (for tJser) to redisplay the current record in user forn^t 

H (for Hard-copy) to obtain a hard-copy of the current record. 

Backup and Skip may be followed optionally by a number. Backup 
moves back 1 and displays that record, the Skip moves forward 1 and 
displays the record following it; thus B n (where n is a number of 1 
or more digits ) moves back n records , whereas SK n moves forward n records 
and displays the n+1 records. Also, 

B 1 is equivalent to B 

SK 1 is equivalent t o SK 

B 0 always goes to the first record 



YOUR OPTIONS HERE ALL, NONE OR A NUIfflER (DMl) 





SK 0 
B n 
SK n 



always goes past the last reoord 

where n exceeds the search co’unt is like B 0 

where n exceeds the search count is like SK 0 



M and U redisplay the current record only if the crnrent format Is U and 
M respectively 5 or else they Just continue to the next record. In other 
words 5 if the format is changed ^ the c^'^rrent record is redisplayed in the 
new format I otherwise the next record is c3dsplayed in the sajjie format. 

The hard-copy Gommand sends the current display to the print file and 
makes no other change. Thus the user may type any of the other commands 
or Just SEi 0 BLOCK for the next record. 



2,5-2 Record Display Format 



The user has a choice of one of two display formats for each record. 
The two formats availahle are the LC MARC II foraiat* and a more raadahle 
user format. The minor changes applied to the MARC II format prior to 
display are: 



a. All codes other than those for punctuation and alphanumerics 
are translated to blanks. Such codes will be known as non-printing 
characters. The exceptions are as follows: 



Ori ginal char . 
(hex, code) 



Translated 
char , 



Interpret ati on 



IF 

26 

3T 

FA 



% begin sub field (old MARC format) 

+ and of field 

* end of record 

$ begin subfield 



b. Lower-case alphabetics are translated to ‘^per-casa since the 
display terminals have no provision for these. 

c. The following four pamctuatlon characters are translated to 
blanks since they cause carriage control effects on the display screens: 



Char . 



Hex. code 



Car, Control Effect 





4a 


Horizontal tab 


1 


4f 


Vertical tab 


~1 


5F 


Carrl age return 


§ 


TB 


Home cursor 


Libraary of Congress , 


Books : 


A M^C Format. Sped; 



Speelfications for Magnetic 



Tapes Containing Monographic Catalog Records in the MARC II Format" Uth ed* 
Washington, D,C, : Information Systems Office, April 1970, 70 p. 
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The user format is a line-indented format with non-printing 
characters appearing as dots . The end of each field in the record is 
denoted by +, except for the last field which ends with * denoting end 
of record. The various fields are identified by brief mnemonics which 



are © 3 ^ lain© d below: 




ftoemoni c 


Type of field 


REC 


ILR Record accession number, 
published date and call number 


MEH 


Main entiy heading (usually author) 


TIT 


Title 




Imprint 


PAG 


Pagination 


SEE 


Series note 


NOTE 


General notes 


SUB 


Subject heading 


OTH 


Other headings (usually co-authors) 



In both formats s after displaying a record (or screenful, if the 
record is long enough) an acknowledgment is awaited from the user before 
the next record (or screenful) is displayed. This usually is requested 
below the last line of the record as: 

+ SEND BLOCK TO PROCEED + or 

+ SEND BLOCK FOR NEW PAGE + 

The user's usual response to these messages is to depress the SEND 
BLOCK key, whereupon a zei -length message. Indicating that the user would 
like to proceed, is sent f.iom the terminal. The user may, however, type a 
list control conmand at the end of a record (this may be confirmed by an 
asterisk at the end of the last^ field of the record). The coiranana is typed 
just after a right arrow > at the beginning of the last line on the screen 
and is, naturally, followed by a SEND BLOCK. Note that on receiving a zero- 
length message (i.e., a pure SEND BLOCK) after ary record, CIMAR0N2 assumes 
the user still wishes to retrieve a number of records he originally indicated 
in answer to MSG5 or 1®G6, In the currently prevailing format until he types 
'K* , 'M* , or 'U* after some aucceedlng record. 

The retrieval process Is ended in one of three ways* the user typed 
NONE or 0 to MSG5 or ^EG6, the user typed 'K' after some record, or 
all the records requested by the user have been displayed. In either 
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case, MSG8 followed ty MSG4 appears thus; 



END OP RETRIEVAL 



(msg8) 



3DQC RECORDS DISPLAYED J 

OPTIONS ARE: EDIT, RESTART, CLOSE, BACKUP (modified ^G4) 

The BACKUP option provided gives the user one last chance to get 
hack Into the display list. If this opportunity is not taJten, the list 
is destroyed and can he recreated only with another search. As hefore. 



EDIT Is the default option. 



TOie action of CIMAR0N2 on receiving ®IT and RESTART has been indi- 
cated in previous sections. EDI'^ processing forms the search of the next 
section. 

2,6 Exiting from CIMAE0N2 

When the iiser types CLOSE in response to either ^G1 or 1^04, a delay 
of a few seconds occurs while CIMAR0N2 proceeds to close all the files 
either es^llcltly or implicitly opened by the user thus far. The data base 
was selected explicitly and opened by the user (see Section 2.2.2), whereas 
index files were inpllcltly opened the first time the associated attribute 
code was employed in a search ej^ression. At the end of thi:i "shut down' 
procedure, fCG0 appears: 

ALL FILES CLOSED 



OPTIONS ARE: EXIT, REOPEN (KBG0) 

REOPEN Indicates that the iiser wishes to begin search operations 
anew on a different data base (it would be wasteful to close a data base 
and reopen it immediately, therefore, EDIT or RESTART should he used to 
continue operations on the current data base), and v' Its In the reappearance 
of the "title page" message described in Section 2,2.2. The user then may 
proceed as before with the new data base. 

EXIT, which is the default response here, indicates that the user 
both Is finished with CIMAR0N2 and will return control to the Terminal 
Monitor System, at which point the following TMS messages appear: 

TI^106i - NORMAL EXIT PROM USER PROGRAM 

TMSIOUa - SPECIT’Y PROGRAM 



The user may at this point recall CIMAR0N2, but once again it is a 
wasteful exercise. If the user desires to sign-off and leave the terminal, 
he responds with LOGOUT, whereupon TMS comes back with: 

TMS 1051 - XXXX LOGGED OUT 

TMSIOIA - WAITING FOR LOGIN 

35 

o 

ERIC 
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/ 

where XXX5C may '136 GP01, GP02, GP03, or the initials of some ILR personnel. 

2.7 Disastrous Ends in CIMAH0N'2 
These are of two tjrpes : 

a. Disasters ending In failure of TM8 due to failiire of the IBM 
operating system or hardware failure in the CPU or the teleprocessing 
system or due to one of the user programs damaging parts of TMS. Such 
conditions will cause ahrupt loss of response from the terminal and 
require a reinitialization of TMS. 

b. Disasters detected arid trapped by TMS, originating from within 
CIMAR0N2. In such eases, TMS puts out appropriate messages about the 
nature of the disaster and purges that user's storage blocks, file buffers, 
and working areas. The user will have lost communication with CIMAR0N2; 
however, he can recall the program and begin anew. A typical example is 
given below: 

Failirre in opening a file, TMS puts out: 

TMS 15 31 - ATTEMPT TO OPER AI URAVAILABLE/URCATALOGED DATA SET 
TMSllOI - ABNORMAL RETURN FROM USER PROGRAM VIA PURGE ROUTINE ' 

Tr®10i+A - SPECIFY PROGRAM 

For other such Tffi messages , the user is referred to Appendix 2 in 
Part I of the Tffl Users’ Manual.* 

2.8 CIMAE0N2 Messages and Code Tables 

CIMAE0N2 ordinary messages are numbered through MSG9 ; diagnostic 

messages are numbered DM0 through DM9. SELDB is the last line of the 
"title page" message. ACK is an acknowle d^ent request. The first line 
of multi-line messages usually appears at the top left-hand corner of the 
screen, i.e, , the HOIffi position, while succeeding lines appear double- 
spaced below it. The messages currently available are listed in Figure 9. 
In the event that one wishes to interpret the Internal codes for displayed 
or non-dlsplayable characters. Figure 10 provides the equivalent codes, 
EBCDIC is the standard internal code in present files. 



*Smith, Stephen F. and William Harrelson, T^; A Terminal Monitor System 
for Information Processing , Berkeley: Institute of Llbraiy Research, 
University of California, 1971, p. i*-3-46. 
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FIG. 9: CIMAR0N2 IffiSSAGES 


Message il 


Text 


MSG0 


ALL FILES CLOSED 
OPTIONS ARE; EXIT, REOPEN 


MSGl 

MSG2 


CIMARON is ready - ENTER BOOLEAN EGRESSION: 
NO RECORDS SATISFY REQUEST 


MSG3 


CONGRATULATIONS - YOUR REQUEST SPECIFIES IHE ENTIRE PILE 


msgU 


OPTIONS ARE; EDIT, REBTAET, CLOSE 


MSG5 


X3ac RECORDS SATISFY REQUEST 
HOW MANY ARE DESIRED? 


MSG6 


ALL EXCEPT 30X RECORDS SATISFY REQUEST 
HOW MANY EXCEPTIONS ARE DESIRED? 


MSG7 


NO RETRIEVAL 


MSG8 


END OF RETRIEVAL 
XXX RECORDS DISPLAYED 


J©G9 


SELECT RECORD DISPLAY FORMAT: MAEC2 OR USER 


MSGE 


UJE LAST SEARCH REQUEST WAS: 


DM0 


THE EXPRESSION CONTAINS ADJACENT OPERATORS - EDIT: 


DM1 


INCORRECT USE OP THE "NOT" OPERATOR - EDIT: 


DM2 


UNBALANCED APOSTROPHES IN Tlffi EXPRESSION - EDIT: 


DM3 


INVALID SYNTAX IN THE EJffRESSION - EDIT; 


DM4 


THE EXPRESSION CONTAINS ADJACENT OPERANDS - EDIT: 


DM5 

DM6 


NO SEARCH OYS IN THE EXPRESSION - RETYPE OR EDIT: 
UNBALANCED PARENTHESIS IN THE EGRESSION - EDIT: 


DMT 


I/O ERROR IN SEARCHING INDEX PILE " " - EDIT: 


DM8 


ILLEGAL INDEX CODE " " IN SEARCH KEY - EDIT: 


DM9 


* WARNING * - LIST OVERFLOW DETECTED, CUT OFF 
OCCURRED AT THE FOLLOWING 


SELDB 


+ SELECT DATA B^E: TYPE "SD" OR "SC" , THEN SEND BLOCK + 


ACK 


+ SEND BLOCK TO PROCEED + 


DMN 


. YOUR OPTIONS HERE ARE; ALL, NONE OR A NUMBER 37 



PIG. 10: EQUIVALENCE TABLE OF GRAPHIC REPRESENTATIONS 

ANT) INTERNAL CODES (LISTED IN EBCDIC SEQUENCE) 



NM‘IE 


/ ^ 
/ ^ 

/ ^ 




/ / ASCI 

h; 6-BIT 0 

/ # / ^ / 

^ / f 8 / 

/ '^ / % / 

/ P / i / P 


I / 

CTAL / 


Null 




00 


00 


738 


00 








01 










Double Underscore 




02 


P5 




65 




Angstrom 


o 


03 


EA 


75d 


52 








0i^ 




8 










05 














06 










Delete 




07 


7P 




77 




Circumflex 




08 


E3 




43 




Cedilla 


i 


09 


FO 




60 




Superior Dot 


A 


0A 


17 


'^4 


47 




Left Hook 


J 


0B 


P7 




67 




Right Hook 


u 


0C 


PI 


'^4 


6i 




Inverted Cedilla 


i 


0D 


P8 




70 




Hacek 


V 


0E 


E9 




51 




Acute 




0P 


E2 




42 




Double Acute 




10 


EE 




56 




Umlaut 




11 


e8 




50 




Dleresls 


St 


12 


FC 


^4 


74 




Tape Mark 




13 


17 




27 








lU 














15 










Backspace 




l6 


08 


73g 


10 




Idle 




17 


16 








Candrabindu 




18 


IP 


■^58 


57 




Macron 




19 


E5 




45 








lA 










Double Dot Below 




IB 


F3 


T58 


63 




Dot Below 




1C 


F2 




62 




Circle Below 


o 


ID 


fU 


758 


64 




High Comma 


9 


11 


FE 


758 


76 




High Comma (off canter) 


9 


IP 


ED 


758 


55 








20 














21 














22 











*ASCII 6-Bit Code containing no escape code is in standard set. 
Escape code = 

73q - Non Standard Set I 
75g - Non Standard Set II 
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PIG. 10 (Cont.) 



NAME 




O 

ERIC 



High Question 


? 


23 


E0 




IeO 


Lina Feed 




2k 

25 


0A 


'^38 


12 


End of Field 




26 


IE 


'^38 


36 


Upadhmaniya 


w 


2T 

28 


F9 




T1 


Tilde 




29 


EU 


'^58 


1+4 


Grave 




2A 

2B 


El 


'^50 


4l 


Breve 


w 


2C 


e6 


'^58 


46 


Double Tilde 1st Half 




2D 


FA 


'^58 


T2 


Double Tilde 2nd Half 




2E 

OP 


fb 




T3 


Ligature 1st Half 




30 


eb 


'^50 


52 


Ligature 2nd Half 


■~s 


31 


EC 


■^58 


53 


End of Transmission 




32 

33 
3k 

35 

36 
3T 


ID 


^^38 


35 


Pat ent 




38 

39 
3A 
3B 
3C 


AA 




12 


Flat 


b 


3D 


A9 




11 


Open Bracket 


[ 


3E 


5B 


T38 


T3 


Close Bracket 


] 


3F 


5D 


T38 


75 


Space 

Period 


* 


k0 

kl 

k2 

iE3 

k3 

k6 

kl 

k9 

kA 

4b 


20 

2D 


00 

16 


Less Than 


< 


4c 


3C 




34 



39 



FIG. 10 (Cont.) 




NAME 



Open Paren 
Plus 

Ai^ersand 
Mlagkil Snak 
Tverdyl Snak 
Alif 
Ain 



Exclamation Point 
Dollar Sign 
Asterisk 
Close Paren 
Semi Colon 

Minus , Itypen 
Slash 



Middle Dot 



British Pound 
Comma 
Percent 
Underline 



Greater Than 
Question Mark 




PIG. 10 (Cont.) 



O 

ERIC 




ASCII 

6-bit octal 








78 












T9 








Colon 


i 


TA 


3A 




32 


Cross Hat eh 
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7C 
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1 


TD 
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TF 
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0 
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a 
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h 


88 


68 




48 


Lower Case I 


1 


89 


69 




51 


Lower Case M 


m 


8a 


B5 




25 


Lower Case Cross D 


4 


8b 


B3 


23 


Lower Case Eth 


r 


8C 


Bt 
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3, BR0WSER2 TERMNAIi OPERATOR'S GUIDE 



3.1 Overview 

BROWSERS is an independent routine, operating under TM8, which may he 
used to scan cijrrently stored index files, to save index terms tei^orarily, 
and to ohtain hard copy of the displayed terms. Our current development 
plans call for increasing the ease of conmunlcatlon ‘between BROWSER and 
CIMABON, and eventually for their incorporation into a single program system 
with two operating modes , 

Currently, however, BROWSER Is a separate program and must he entered 
using BR0WSER2 as the program name following the TLffi program select command: 
T1®104A - SPECIFY PROGRAM. Once BR0WSER2 is entered, two informational dis- 
plays are availahle If positive replies are given to the first two BR0WSER2 
(questions , 

QOl: DO YOU WANT OPERATING INSTRUCTIONS? 

A YES response results in a summaiy page of operating instructions heing 
displayed (sea Figure 11). 

Q02: DO YOU WANT A LIST OP ACCESSIBLE FILES? 

A YES response will display the current index file inventory. 

The files currently available in this inventory are: 



SCAUl - 


Santa Crus 


Author 


SCSUl - 


Santa Cruz 


Subject 


SDAUl - 


Ban Diego 


Author 


BDTIl - 


San Diego 


Title 


SDSUl - 


San Diego 


Subject 


SDADl - 


San Diego 


Dolby i zed Author 



After the initial BROWSER informational screen displays, the major BROWSER 
command fimctlons are; 

a. Select a data hase index file 

b. Select which portion of the index file is to be examined 

c. Advanee the display 

d. Save an index file entry 

e. Display the Save Area List 

f. Remove a term from the Save Area List 

g. Print a hard copy version of the Index file display or of the Save 
Area List 

h. Exit from file examination or from BROWSERS 
1. Display the avedlable commands or file names. 



PIG. 11: CURRENT BROWSER COMl^DS 



COMMAND 


MEANING OF COMMAND 






sendblock 


display advance 


P xbl 


forward x terms 

(x should be less than 100 ) 


G 'xxxx' 


get Index entry xxx 


s xbl 


lixklO move line x to save area 


R xbl 


delete line x from save area 


D 


^ ^ 4 - j- 1 Tsave area 1 . finded 

transfer from ciirrent display . , to 

■ - Lindex are%J L savg 


H 


hard copy output of current display 


//Close 


close current f i le 


//Exit 


exit from BROWSER 



3»2 BROWSER Commands 

3.2.1 Select Data Base Index File 

Following the informational dlsplaya or on exit from a prior file^ BR0WSER2 
requests that the user SPECIFY PILE NAfffi. The proper response to this is 
enter the name of any legitimate index file e.g. SCSUl. The index file name 
may be entered only as a response to the BR0WS1R2 request to speeify a file 
name. Entering the name of a liOn-existent file will cause a syntax error ^ 
and BROWSERS will request that the name be re^specified, 

3.2.2 Select Portion of Index Pile to be Displayed 

This command enables the user to specify an alphabetic key as an initial 
display value. The BROWSERS message is: 

SPECIFY KEY OR PARTIAL OY. 

Ihe format of the response is: 

G ’Kej^ Value’ (e.g. G ' LIBRARIES ’) . 

The attribute value (AU, BU, etc.) need not be specified, since it is implicit 
in the index file name which has been used for selection, BROWSER will use 
the alphabetic value of" the Key Value entered to begin an alphabetically ordered 
display of index file entries. This is shown in Figure 12. The display 
contains ten entries, and each entry is numbered. The display also gives a 
count field which es^resses the number of master file records which are linked 
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%o this index file entry. This coimt is effectively the number of works 
indexed by an individual index descriptor. If the G ’Key Value' gives a 
value which does not exist in the index file, then the display begins with 
the next legitimate index file entry. 

PIG. 12: BR0WS1R2 DISPLAY OF IKDIX TERMS 



COtJNT 



1, 


LIBRARIES 


$X 


AFRICA $X DIRECT^^ 


0001 


2. 


LIBRARIES 


$X 


ANECDOTES, FACETIAE, SATIRE* 


0001 


3. 


LIBRARIES 


$X 


AUSTRALIA* 


0003 


U. 


LIBRARIES 


$X 


AUSTRALIA+ 


0002 


5. 


LIBRARIES 


$X 


AUTOMATION* 


0002+ 


6. 


LIBRARIES 


$X 


AUTOMATION $X CONGRESSES* 


0002 


T. 


LIBRARIES 


$X 


BIBL* 


0001 


8. 


LIBRARIES 


$X 


CALIFORNIA* 


0002 


9. 


LIBRARIES 


$X 


CALIFORNIA $X PERIOD* 


0001 


o 

H 


LIBRARIES 


$x 


CALIFORNIA $X P1RI0D+ 


0001 



Of note here are the meanings of the special characters and +. 

The $X is the MARC subject heading subfield $X and is used to identiiy a topic 
subdivision of a subject heading. In the Santa Cruz file (from which this 
example is taken), $X is a default value and will be used to Identic geo- 
graphic and chronological subject subdivisions as well as topical. The 
symbols * and + correspond to the MARC 'f (end-of-field) and ^ (end-of— record) 
signals . 

Also of note are the double index file entries for lines 3-H and 9-10. 

It is the current practice of CIMARON not to ignore the MARC "f and 3^ signals 
nor to treat these as logically eq.uiYalent codes. Conseq^uently , the same 

subject heading will be entered twice i: it occurs both as a final and a non- 

final field in two or more master file records. This practice does not 
effect search requests , and will probably be eliminated from future versions 

of CIMAROW. 

3.2.3 Advance the Display 

In order to move the display, the user may specify a new Key Value (e.g. 

G ’ECONOMICS') or he may move forward In the file a fixed number of terms by 
entering; 

FX^ (e.g. F 5 or P 10 or F 100 ) 

where is any number followed by a blank space. The display will then be 
advanced by X terms . When the next screen is displayed, it should be noted 
that the lines (i.e. index file entries) will be numbered consecutively from 
1-10. To move backward in the list, only the G command may be used. WARNING; 
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Only numeric ■values of less than 100 shoiild he used since the intervening 
terms must he retrieved and counted serially in order to maintain correct 
positioning. For larger moves use the G command. 

3.2. U Save an Index File Entry 

Since the ulti m ate goal of hro'wsing is to collect candidate terms for 
CIMAR0N2 search requests , BR0WSER2 allows users to save index file entries 
for later use. This is done hy transferring entries from the index file 
display to a special Save List or Save Area. The coimnand to transfer index 
file entry to the Save Area is : 

S Xjd (e.g. S 5^) 

where is entry n'ltmher (from I-IO) of f e term to he saved. A nTJmher 
larger than 10 will .^esult In a syntax error. The reealt of this command is 
to add the selected term to the end of the Save Area List^ 

3,2,5 Display the Save Area List 

In order to review the current contents of the Save Area Listg the user 
enters the command 

D. 

This causes the Save Area List to he displayed. The Save Area List looks 
exactly like an Index file display, Ihat is^ each line is niomhered 
consecutively from 1-10 and contains the term and Its count. In order to 
return to the display of index file entries g the command is re-entered. 

3 , 2.0 Remove a Term from Save Area List 



Prequentlyg the Save Area List may need to he pruned. In order to do 
this the following coimnand is uaeds 

R Xtrf (e.g. R 5)5) 



where X)5 is emy numher from 1-10 followed hy a blank. The command results in 
the deletion of the Xth term from the Save Area List. 



3.2,7 Print 



The Information ProcesBing Lahoratory as yet does not have a direct 
facility for producing printed versions of selected terminal displays. In 
order to accon^liBh this useful fTanctlon, the line printer of the 360/lt0 at 
the Cantus Con^uter Center is utilized. The coi mn and to create a hard copy 
version of a BB0WSER2 display is* 

H. 

The command results in the printing of the current display, whether that 
display currently consists of index file entries or of the Save Area List, 



3.2.8 Close or Exit 



Two exit options are available. The current index file may be closed. 

The command for this is 

//CLOSE. 

This will terminate examination of the current index file and will re- 
initiate the q^ueation: 

SPECIFY FILE NAMi. 

At this point another file may be opened for browsing. Closing a file does 
not affect the contents of the Save Area List, 

To leave the program entirely and return to TMS the command 

//EXIT 

is used. This results in the termination of all BROWSERS operationB , including 
the purging of the Save Area List. 

3.2.9 Display the Available Commands or File Names 

If the user would like to review the commands available * he may 
enter. 

//HELP. 

This will result in a redisplay of the page defining the conmands and their 
uses. In a similar manner, the available file names will be displayed in 
response to the user's command; 

//LIST. 

With these commands at his disposal, the user can master the use of BROWSER 
rapidly. 
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k. USERS’ GUIDE TO FILE BUILDING 



4.1 Overview 

A core set of three data Base progrEons , PILOR^ ZODIAC ^ and PAX, and 
IBM utility sort are required to establish the three-level linked file 
structi^e which is searched by CIMAEON 25 the on-line retrieval routine* The 
three- level file structure that is established on disk consists of a search 
key file, an intermediate address file, and a master file. An index- 
sequential file is at the highest level and each of its records contains a 
search key and an address (link) to the next level file. If but one master 
record is referenced by a given search key, the associated address points to 
that master record. Otherwise, it points to a record in the intermediate 
address file. At the second level is axi addreBS file which has sets of 
addresses of masuer file records associated with a given search key in the 
level one file* At the third and last level of the file structure is the 
master fil€? itself* It consists of a sequential array of master bibliographic 
records stored in a direct-access file. Any given record in this file can be 
accessed by the address obtained from the other files. 

By dividing the file construction operations into separate, small, func- 
tionally oriented routines , the set of routines can be used flexibly in order 
both to cariy out the file construction operations on a wide variety of files 
and to provide a wide variety of Indexing to the individual records* This 
is not to say that any file could be utilised readily without modification of 
the routines, but rather that when modifications are required, it is a straight- 
forW^ard matter to identify the affected cos 5 )onent routine and the nature of 
the change that would be required. 

For exar^le, ZODIAC, the routine which searches master bibliographic 
records and creates the individual search keys desired, presentiy is organiaad 
to obtain the master records from a random access disk file via an intermediate 
index file* If it were desired to obtain search keys from records stored in 
a sequential file for which no intermediate index were available, it would be 
necessary to modify only that segment of ZODIAC which is concerned with obtain- 
ing the next logical record from the master file. 

As another example, if it were desired to utilize bibliographic records 
which are not in the MARC structiire, it would not be necessary to modify any 
of the routines in the file building system except ZODIAC. It would be 
necessary to develop « new routine to replace ZODIAC since it is heavily 
dependent upon the MARC structure* And finally, only the display "CSECT” of 
CIMAR0N2 would have to be modified In order to store, search, and retrieve 
these records appropriately* 

4.2 Creation of the Bibllogrephic Master File 

The first step in the ^ile creation sequence is performed by the program 
known as FILOR* This program takes as input a file containing source master 
records and provides two files as output 2 the first is a direct-access master 
file which ultimately will be the third level of the file structure, and the 
second is a sequential finder file in which each record contains the master 
file record accession number and the disk address at which that record can be 
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found. At present the Input file is defined as a sequential file of variable 
length master records ^ either blocked or unblocked, vith standard IBM conven= 
tion on placement of the record length information. However , with minor 
modification, the program could run on fixed length records as well. 

The first output file is a direct access file and the variable run para- 
meters in this file are the block sise (BLKSIZE) and the logical record 
(LRECL) length,^ These two parameters in association vith the UNIT and VOLIME 
parameters would enable this file to be established on a variety of direct- 
access secondary storage media, At present, if a maBter record will not fit 
within a physical block, it is segmented and stored in two contlguoias blocks. 

The finder file, the other output file produced by this program, is a sequential 
file of blocked records. The block size is again a variable parameter, as are 
UNIT and VOLlBffi, This file thus can be established on any sequentially access-- 
ible storage mediimig and in general a tape file is used. 

An important characterd ^tlc of the direct-access master file created by 
this program has to be mentioned here. A logical record in this file can be 
laid out across a block boundary, This has been done with the view of 
optimizing the packing density in this file. Thus, routines which attempt to 
retrieve records from this file would have to make use of "splicing" proce- 
for split records. Split records are those having a first part at 
the end of a given block and the second part at the start of the next sequential 

block in the file. Information about the size of each part In the two differ- 
ent blocks can be obtained easily from the three-part disk adtoess which has 
been described earlier. The combination of track number, track offset, and 
record length, in conjunction with a knowledge of the capacity of each track 
or block is sufficient to determine the sizes of the two parts of a split 
record. 

Each record of the finder file consists of eighteen bytes (see Pigxire 
13), The first six bytes contain the record (acceBslon) number in EBCDIC, 
and the next twelve bytes constitute what is known as the pointer field . This 
field essentially consists of an eight-byte disk address, in addition to type and 
flag information. The type and flag Information in each finder file record 
attempt to standardize the file types and the natm^e of the content of 
each file into two or three classes so that future programs can obtain 
dynamically the type of the file and the nature of its content and invoke 
specialized retrieval routines. In detail then, the first six bytes give the 
accession number; the seventh byte In the finder file record is a code Indicat- 
ing the type of file (e.g., index sequential, direct-access, etc,); the eighth 
byte in the record Is a code indicating the type of content. Currently three 
types of file content are defined: key- type content , address-type content and 

dat a- typ e content . The maste?’ file for example, would be a file with data- 



^Por reasons of storage efficiency, it is recommended that BLKSIZE equal 
LRECL equal track capacity on ^sks and drums, i.e., the record format is 
fixed (RECFM“F). 

^^Including tape devices with a block-skip feature. 

^^^In the termlnolo^‘ of IBM aecess methods such records would be known as 
spanned records , 

****May now be available in BDAM, RECFM^PS. 
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type content. The ninth and tenth bytes in the record constitute a two-byte 
file code. Ciirrently the letters MF have been chosen to indicate the master 
file. Other codes are for attributes, for example AU for authors, TI for 
titles, SU for subjects, etc, (These codes are established by PAX.) The 
remaining eight bytes constitute the true disk address of the master file 
record. Of these, the first fo"r bytes give the relative track number in 
binary, the next two bytes give the offset into this track at which the master 
record begins , and the last two bytes give the length or extent of the master 
record. The four— byte track number has been desipied towards making use of 
the IBM direct block addressing facility. This facility reqLUires a three-byte 
relative block number, which In our case happens to be the relative disk track 
number,* The fourth byte could be used in the future to indicate the relative 
unit on which the file is to be found. This would handle the case of a very 
large file which spans a nunber of disk units. However, CIMARON currently is 
established to accept only the value 0, 

FIG. 13: FINDER FILE RECORD FORMAT 

File Name (controlled by JCL of FILOR) 

Type of File: Sequential 

Record Length: l8 bytes 



Data Type 


Field Name and Position 




_ _ ... 






EBCDIC 


Accession number (1-6) 
Pointer field (7-18) 




EBCDIC 


Type [of the file referred to] (7) 
Values: Is index sequential 

05*^ direct access 
3 9 secLuentlal 




EBCDIC 


Type of content (8) 
Values : 0 , key 

1, address 
12. data 




EBCDIC 


Pile code (9-10) 

Values : ^ , master file 

AU, author index 
etc. 




Binai’y 


Track location (11-13) 




Binary 


Unit number*** (l4) 
Values : U, ist 23lh 




Binary 


Offset (15-16) location of beginning of record 




Binary 


Record 1^ gth (17-18) 





*If the master file were established on tape, this would be the relative 
block number. 

Indicates the standarc values set by FILOR. 

***Unit number is provided in order to allow storage and access to multiple 
disk units . To be put into effect , however , additional unit designations 
must be included in CIMARON Table, 

Er|c .!, 9 . 




FIG. ll+J ZODIAC RUN SETUP 



Job Control Language ( JCL) ; 

//B5844JAZ JOB (5844315550500) j ' ilr-cunninghamsmsglevel^i,class=l 

//GO ElOEC PGM-Z0DIAC2 

//STEPLIB DD UN1T^23143DSN=ILE.BATCHLIB3DISP*SHR 
//GO.SYSUDUNP DD SYSOUT=A 

//GO.LONEACC DD UNIT=23l4,¥0L=SER-ILHO2,DSNslLR.SCAC3,DISP-OLD, 

// DCB=(RECFM=PB,BLKSIZE=l800 ,LREGL=l8) 

/ /GO . MASTERR DD UNIT=^'23l4 , VOL=SER= ( ILB03 5 ILR05 ) ,DSNsiLR . SCWa , 

// DCB=(DS0RG=DA,EECPM=P,BLKSIZE=7294) 5DISP=0LD 

//GO. PRINT DD SYS0UT'«A5DCBs(RFCPMsFB,LRlGL=255BLKSIZEaT50) 

//GQ. IAUTHORSI DD UNIT-TAPE,DCB»(RECEM=FB,LRECL=94 5BLKSIZE»l692 , 

/ / TRTCHaC , DENs2) 3 L^EL- ( 1 , BIP ) 5 

// V0L=S1R= ( I4117RL 4118R ) .DISP- ( NEW sI^EP ) ,DSN»SCAUTH 
//C^ .ITITIJ]r 1 DD UNIT=TAPl3DCB=(RBCPM=ra5LRECL-94,BLKSIZE-l692, 

// TRTCH“C,DENeg),LABEL=(l,BLPK 

// V0L-SER=( I5I19 R|) ,DISP=( new .KEEP) jDSNbSCTITLE 

//GO. IsuStI^ DD UNIMAPE5 DCB-(RECFM-FB,LRECD^945BLKSIZE=1692, 

// TOTCH .DE N=g ) 5IA BELS ( 1 .BLP ) . 

// VQL^SER^ ( iSlglRl . 4lggR ) ^DISP^^CNEW,!®!?) ,DSN^SCSUBJ 
//GO. CARDIN DD ^»,DCB=BnCSIZE=80 

AUTHORS 100 . 4 , 110 i 4 5 111 . 4 5 700 . 4 , TIO . 4 , 711 . 4 

TITLES 130.4,240.4,245.45440.4,730.45740.45840.4 

SUBJECT 600, 45610.4,611.4,630.4,650.4,651.45660.4 

/* 

// 



INPUT FIUlS i 

Master file - DSNAiffi=ILR.SCNff’2 refer DD=MASTERR 
Finder file - DSNAME-ILR.SCAC3 refer DDbLONEACC 
Controls - none at present 

OUTPUT FILES : 

Authors - DSNAM:=SCAUTH refer DD=AUTHORS 
Titles . - r8NAME=SCTITLE refer DD=TITL1S 
Subjects - DSNAME-SCSUBJ refer DD-SUIJICT 
Controls - 'nirough card input refer DDsCAEDIN 
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The control input for the program FILOR is indiGati?d on cards as shown 
in Figure 15. The information supplied on these cards is as follows; the 
number of records that are to he appended to the master file, the relative 
track number, and the track offset at which the records are to be appended. 

In the case of a master file which is being established on disk for the 
first time, the track number and offset would be zero, indicating it is being 
appended "from the beginning." In the case of update runs, track number and 
offset in the master file would be obtained from the run statistics of the 
previous run, and this would be input as control to the program so that the 
file is appended at the right point. 

4.3 Extraction of the Index Information 

The next program in the file creation seq^uence is known as ZODIAC. This 
program has the function of extracting various fields from MARC records as 
attributes of the record, on which index files will be subsequen .y established. 
The program is definitely tied down to the MARC II format* in that each record 
is expected to have a MARC structured leader and directory through which the 
variable fields are obtained. These variable fields which will be used to 
establish the index files are selectively extracted by the program. This 
program has two input files and a variable number of output files besides the 
controlling input which is supplied on cards. The two Input files are the 
same files which are produced by FILOR in the first step of the file creation 
process viz, the sequential finder file and the direct-access master file. The 
controlling tables which are set up as a result of reading in the parameter 
cards determine the number of sequential output files created and their content 
(see Figure l4). 

The structure of any record in an output file is the same. It’s a 94- 
byte record of which the first ei^ty bytes contain the variable field waloh 
has been extracted for the pu^oses of establishing an index. Since only 
eighty bytes are allowed** there might be cases where the field was trimcated. 
The remaining fourteen bytes in each record consist of two parts. The first 
two bytes represent the MARC tag (in binaiy) identifying the type of variable 
field, and the last twelve bytes are the pointer field (see Figure 13) picked 
up from the last twelve bytes of the finder file entiy for this particular 
MARC record. The construction of an output file record in this manner makes 
sure that any variable field which is extracted from a given MARC record is 
firnily associated with an address ’pointer' to the MARC record itself and the 
tag identifying this variable field in the MARC record. 

4,3.1 Parameter Control 

A parameter card consists of the name of the output file followed by a 
sequence of three-digit tags, delimited by commas. The tags indicate the 
variable fields that have to be extracted from each MARC record and routed to 
a given file whose name is supplied in the card. For example (see Figure l4) , 
a parameter card which could be supplied as controlling input to ZODIAC may 



*See "Specifications for Magnetic Tapes Containl* Monographic Catalog Records 
in the MARC II Format," in Books ; A I4ARC Format , Washington, D.C.: Library 
of Congress, Information Systems Office, April 19T0. 

**This is a reprogrammable parameter . 





riG. 15: FILOR RUN SETUP 



Job Control Language (JOL); 

//B58UUJAF JOB ( 58 UL, 15 , 50,00) , 'ILR-CUNNINGHAM' ,MSGLEVEL=l,CLASb=L 
//GO EXEC PGM-PlL0R2,C0NDsC0ND-EVlN 
//STIPLIB DD UNIT-23lj+,DSNAMl=ILR.BATCHLIB,DISP»SHR 
//GO.SYSUDUEP DD SYSOUTsA 

//GO.CARDIMIN DD UNIT=TAPE,¥OLsSER= ( 3T1T ,3T60 ,3T80 ) ,LABlL-(l ,BLP) , 
// DCB*(lECPM=VB,LRECL-2048,BLIffiIZE=3600,TRTCH-C,DEN=S) , 

// DISP=( OLD, KEEP) 

// DD UNIT-TAPE, 70L*SER*(3T2 5, 3766 , 378 T ) ,LABEL=(l,BLP) , 

// DCB=C11CEM=V1,LRECL=2048,BLKSIZE=3600,TRTCK-C,DEH=2) , 

// DISP=( OLD, KEEP) 

//GO.LONEACC DD DCB=(BLKSIZE= ,800 ,L^CL-l8 ,RECra-FB) ,UNIT=23lU , 

// 70LIME=SER=ILR02,DISP=(NEW,HEEP) ,DSN=1LR,SCACC2, 

// SPACE-(CYL,(15,1) ,RLSE) 

//GO.MASTIRR DD UNIT»231^ ,DlSP=(NEW,OEP) , 

// 70L-(,,,2,SER=(ILR03,IIB05)), 

// SPACE=(CYL, ( 199 , 15 ) ,RLSE jCONTIG) ,DBNelLR, SCMARC2 , 

// DCB=DSORG«DA 

/* 

// 



INPUT FILES : 

Master file — DSNAME^ refer DD=MASTERR 

Controls - none at present 

OUTPUT FILES : 

Master file - DSNAME=ILR.SCMARC2 refer DD=MAST1R1 
Finder file = DSNAME=ILR.SCACC2 refer DD-LONEACC 
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consist of the word AUTHORS in the first seven columns of the card followed 
by the tag 100 beginning at column ten.* This would indicate that all 
variable fields associated with tag 100 (which Identifies a personal author 
entiy) be routed to an 'authors* file. The digit following the tag specifies 
an Initial offset in the variable field. This is tc skip over binary indica- 
tors and codes which may occur at the start of the field. Each re rd in the 
output file SCAUTH would consist of a personal author name (blank filled) in 
the first eighty bytes , followed by the tag 100 , in binaiy in the next two 
bytes followed by the twelve-byte pointer field (see Figiire l6) . 

riG. l6: ZODIAC Internal Control Tables 

NAME maximum of twelve eight-byte entries , one entry for each 

output file 

CALTEX maxlm\jm of twelve one-byte entries 

MARCEL 850 one-byte entries * one for each possihle MARC tag, each byte 

is a possible index to an entiy in the NAlffil table 

TAGOiT' 850 one-byte entries | each bj-te gives an offset from the 

beginning of the field, to skip fixed length control (e.g., 

$a) 



it. 3.2 ZODIAC Control Tables 

It ml^t be instructive to describe the tables which are used to dri-ve 
this program. There are four Important tables in this program: NAfffl, CALTEX, 

MARCEL and TAGOFF. These tables are simmarlaed in Figure I 6 , The first is 
known as NAJffl, and this table can have twelve ei^t-byte entries. This is a 
table of the twelve possible output file names each of which can be eight 
characters long. The next table is known as CALTEX, and this can hold twelve 
eight-bit flag entries. The next table is known as MARCEL, which has 850 
positions - each of which is one byte wide. This table reserves one entiy 
for each possible MARC tag. I-iARC tags currently run from 000 to 850 j thus 
we have 85 I positions in this table. The table is initialized to blanks and 
after the reading in of the parameter cards , a given entry in this table 
contains a one-byte index into the NAME table. This defines a given tag as 
being required in a given output file. A table which is parallel to this 
table is known as TAGOFF, which also consists of 850 one-byte entries. Each 
entrj'" in this table gives an offset from the beginning of the variable field 
In order to skip over flxed-lengbh control fields at the head of the variable 
field. The off set table is set up in parallel with MIARCEL, as a result of 
reading in the parameter cards , On the parameter card each tag is followed 
by the offset , which is supplied within two digits and delimited from the 
tag by a period. This two digit offset indicates the number of bytes that 
have to be skipped over from the beginning of the variable field in order to 
get at the data which is required in the output file. For example, \s 
currently skip over the sub-field deli mi ter and the sub-field code at the 
h; of variable fields. The other sub- field delimiters and the sub- field 
V within the variable field Itself will be carried as part of the output 
fij-e. The table known as CALTEX is associated with the programming logic. 



*The file name can be up to eight characters long in accordance with O.S. 
conventions . Colvmm 9 is ignored. Tags must begin in column 10. 

O 
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This table is inltialiEed to ^ero and indicates hy a non-zero entiy the fact 
that records have been output to a given file previously, ThuSj on the first 
time that a given tag is encoi^tered in a record and found to be req.uired 

in a given output file, a special routine finds zero entry for that file in 
CALTEX and obtains vorklng storage for that file^ opens the file and performB 
other Initialization procedures. CALTEX is a table of flags which indicates 
whether the initialization procedures have been performed or not, 

U.3.3 Processing Seq^uence 

We miglit end by briefly dtascribing the typical sequence of operations 
that take place on reading in a particular MAEC record. First a record is 
read in from the finder file and the portion of the finder file record which 
gives the record address is used to issue a read to the direct-accaBS master 
file. This results in bringing In a particular MARC record into main storage. 
Next, the MARC record directory is sequentially scanned from beginning to end 
and the tables simultaneously consulted. As each tag is encountered the table 
MARCEL is looked into and a non-blank entiy indicates that the variable field 
associated with thie tag is required in a given output file* The name of 
this output file is Indirectly known via the index quantity in the current 
entry of MARCEL, By making use of this index the flag table, namely CALTEX, 
is indexed and it is determined whether this file has been initialized for 
output or not. In the event that it has not been initialized for output. 
Initialization procedures are performed and then one goes on to the subsequent 
portions of the routine. These subsequent portions pick up the relative 
address of the variable field from the current entiy in the record directory 
and move the variable field out to the output record buffer where a 9^^byte 
logical record described earlier is construetad and written out to the asso- 
ciated output file. This procedure is repeated by going back to the finder 
file and finding the address of the next BWIRC record, after all the directory 
entries in the current MARC record have been scanned, 

^.3.^ Program Constraints 

There are two limitations in the number and nature of output files that 
can be created in one run. First, the number of output files that ZODIAC can 
create is variable up to a maximum of twelve. This is a limitation of the 
table-structure in the program because the first step In the program is to 
read the parameter cards and create tables which are subsequently used to 
drive the program in its variable field extraction procedinras. The second 
limitation is that there must be an exclusive partition of the tags across 
the various output files , in other words a given tag cannot appear in more 
than one output file-defining parameter card. This means that a MARC variable 
field can be routed to one and only one output file. 

The execution of this program is also controlled by the finder file in 
the following manner? as mar^ records are brought into main storage and 
analyzed as there are entries for them in the finder file and the finder file 
is accessed sequentially from beginning to end, A possible future control on 
this program would be to define contiguous subsets of the master file to be 
analyzed by ZODIAC, through the incorporation of record skip and record count 
controls. 
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Currently the program is tied to the format of the finder file . (l8-hy te 
records) and also the format of the output files (94-hyte records). 
reassembly, the length of the output records can be changed, in order to 
change the length of the key field carried in the output records . This length 
can be anywhere from 1 to 23 ^ bytes, since the variable field is eventually 
to be used as the key in an index-sequential file, and the O.S, limitation on 
the size of this key is 256-bytes. 

4.4 Sequencing the Index Data 

The next step in the file creation procedure is executed by the IBM sort 
utility. Each of the output files produced as a result of a run of ZODIAC 
are passed through the sort utility, which sorts the records so that the key 
poi’tlon (the first 80 bytes) are in alphabetical sort order. Each run of this 
program takes one input file, ^rts it and pfoduces one output file, which is 
then ready for the final step in the procedure. 

The controls supplied to the IBM sort utility are as follows: the 

record length, the offset from the heginnlng of the record to the sort key 
in each record, the length of the sort key, and the sort order, namely 
ascencing or descending. The sort utility also has provision to specify the 
sort key in two parts. This in fact is being done currently with the records 
output from ZODIAC. As mentioned earlier, each of these records consist of 
94-bytes , the first 80 of which have the va.riable field and the last twelve 
contain the pointer field, which contains the disk address of the master 
fl3.e record. The sort Is currently being performed primarily on the variable 
field and secondarily on the twelve-byte pointer field. This insures that 
any collection of pointers in the level two address file of the file structure 
associated with some key in the level one file, will be in ascending order 
of master file address. This buys a little efficiency in the running of the 
retrieval routines. One of the first things that is done on retrieving a list 
of addresses from disk is to put them in a sort order, so that comparisons 
can be done between two lists. So by specifying this secondary sort 
key one can save the initial sort on reading in a list of addresses from disk. 

4.5 Creation of the Index Files 



The final step of the file creation procedure is performed by a program 
called PAX. This program has one input file and produceB two output files. 

The input file is a sequential file and it is the output of the IBM sort 
utility. The two output files produced by this program are respectively, the 
level one index- sequential access file and the level two direct-access 
address file. If a pair of these files is established for a given attribute 
in a record colle'''®‘lon, it will enable the master file collection to be 
ssarched via this attribute. 

The process performed by the program PAX and indeed much of its logical 
structure is quite similar to that of FILOR. The points of difference lie in 
the format and content of files handled by this program. The level one index- 
sequential file consists of records which have two in^ortant parts. The first 
part is the key and this key will be the argument for searches performed on 





this file. The second part of the record consists of a twelve-byte pointer 
field, which will point to- a collection of twelve-byte pointer fields in 
the level two address file. The length of the record keys of this first 
level file is variable up to a maximum of 80 . On loading different files, 
this can be varied without reassembly of the program, since the key length is 
supplied as a PAX run-time parameter. After the file is opened (e.g., by 
CIMARON) this parameter is 8.vails.ble in the data control block foi’ the file and 
thus files can be created with records different length and the using program 
can be adjusted to the key length of the specific file. 

Currently the two— byte binary tag which is carried in the input records 
does not appear in the index-sequential file. At a later time, it may be 
used to further 'refine’ attributes (to the tag level) during search. 

One of the important functions performed by PAX is to make siire that only 
one index-sequential record is created for each unique key in the input file. 
This is established on the basis of a coitparison between the 80-byte variable 
field portion of the current input record and the corresponding field in the 
previous record. At the point when a mismatch is detected the program will 
create a new record in the access file - that is the level one file. This 
record would consist of a key constructed from the variable field portion 
of the previous record followed by a twelve-byte pointer field which points 
to the location in the level two file at which can be found a sequence of 
twelve-byte pointer fields identifying all the MARC records in which this 
particular variable field appeared. 

It is in^jortant to note that the Btructure of the pointer field is 
uniform throughout the system, and it Is by this field that links are 
established across the levels of the file structure. Specifically, the 
pointer field in a record in the level one file establishes a link to a 
series of pointer fields in the level two file. Each pointer field in the 
level two file establishes a direct link to a master record in the level three 
file. This briefly, is the file structure employed to search master file 
records via Indexed attributes. 

The creation of entries in the level two file proceeds in parallel with 
the reading in of input records. As each input record is read in and its 
sort key found to be the same as that of the previous record, an additional 
entrj'’ (a twelve-bj^te pointer field entzy) is made in the level two direct- 
access file. This additional entry is in fact the pointer field In the current 
input record. At the end of a sequence of identical sort keys on Input records, 
it is time to create a new record in the level one file for the collection of 
input records which have the same variable field. The address at which a 
sequence of master record addresses can be found associated with this collec- 
tion of similar variable fields is entered in the level one file. 



Where there is but one master record linked to a given sort key, the 
level two file is bypassed. Instead, PAX establishes a direct link from the 
record in the level one file to the master record in the level three file, in 
all cases where the sort key in the level one file is uniquely associated 
with a single master file record in which it appears. This enables one to 
bypass both the construction and subsequent reading of a record in the level 
two file d-uring search, Thiis , the speed of the search process is increased 
for such keys. 





In the futiore, it may "be possible to dispense vlth the level tvo file 
altogether. In this in^roved file structure, all of master file addresses 
associated with a key would be found in the level one file entry for that key. 
However, this will be possible only when the IBM operating system supports 
variable- length records in index-sequential files. Currently only fixed- 
length records are supported in these files. 
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